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1. Introduction 

In 1988 I wrote a Ph.D. thesis entitled "Random Processes with Reinforcement" . 
The first section was a survey of previous work: it was under ten pages. Twenty 
years later, the field has grown substantially. In some sense it is still a collection 
of disjoint techniques. The few difficult open problems that have been solved 
have not led to broad theoretical advances. On the other hand, some nontrivial 
mathematics is being put to use in a fairly coherent way by communities of social 
and biological scientists. Though not full time mathematicians, these scientists 
are mathematically apt, and continue to draw on what theory there is. I suspect 
much time is lost, google not withstanding, as they sift through the existing 
literature and folklore in search of the right shoulders to stand on. My primary 
motivation for writing this survey is to create universal shoulders: a centralized 
base of knowledge of the three or four most useful techniques, in a context of 
applications broad enough to speak to any of half a dozen constituencies of 
users. 

Such an account should contain several things. It should contain a discussion 
of the main results and methods, with sufficient sketches of proofs to give a 
pretty good idea of the mathematics involve4l|. It should contain precise pointers 
to more detailed statements and proofs, and to various existing versions of the 
results. It should be historically accurate enough not to insult anyone still living, 
while providing a modern editorial perspective. In its choice of applications it 
should winnow out the trivial while not discarding what is simple but useful. 

The resulting survey will not have the mathematical depth of many of the 
Probability Surveys. There is only one nexus of techniques, namely the stochas- 
tic approximation / dynamical system approach, which could be called a the- 
ory and which contains its own terminology, constructions, fundamental results, 
compelling open problems and so forth. There would have been two, but it seems 
that the multitype branching process approach pioneered by Athreya and Karlin 
has been taken pretty much to completion by recent work of S. Janson. 

There is one more area that seems fertile if not yet coherent, namely reinforce- 
ment in continuous time and space. Continuous reinforcement processes are to 
reinforced random walks what Brownian motion is to simple random walk, that 
is to say, there are new layers of complexity. Even excluding the hot new subfield 



In fact, the heading "PROOF:" in this survey means just such a sketch. 
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of SLE, which could be considered a negatively reinforced process, there are sev- 
eral other self-interacting diffusions and more general continuous-time processes 
that open up mathematics of some depth and practical relevance. These are 
not yet at the mature "surveyable" state, but a section has been devoted to an 
in-progress glimpse of them. 

The organization of the rest of the survey is as follows. Section [2] provides an 
overview of the basic models, primarily urn models, and corresponding known 
methods of analysis. Section [3] is devoted to urn models, surveying what is 
known about some common variants. Section |4] collects applications of these 
models from a wide variety of disciplines. The focus is on useful application 
rather than on new mathematics. Section [5] is devoted to reinforced random 
walks. These are more complicated than urn models and therefore less likely 
to be taken literally in applications, but have been the source of many of the 
recognized open problems in reinforcement theory. Section [6] introduces contin- 
uous reinforcement processes as well as negative reinforcement. This includes 
the self-avoiding random walk and its continuous limits, which are well studied 
in the mathematical physics literature, though not yet thoroughly understood. 

2. Overview of models and methods 

Dozens of processes with reinforcement will be discussed in the remainder of this 
survey. A difficult organizational issue has been whether to interleave general 
results and mathematical infrastructure with detailed descriptions of individual 
processes, or instead whether to lay out the bulk of the mathematics, leaving 
only some refinements to be discussed along with specific processes and ap- 
plications. Because of the way research has developed, the existing literature 
is organized mostly by application; indeed, many existing theoretical results 
are very much tailored to specific applications and are not easily discussed ab- 
stractly. It is, however, possible to describe several distinct approaches to the 
analysis of reinforcement processes. This section is meant to do so, and to serve 
as a standalone synopsis of available methodology. Thus, only the most basic urn 
processes and reinforced random walks will be introduced in this section: just 
enough to fuel the discussion of mathematical infrastructure. Four main analyti- 
cal methods are then introduced: exchangeability, branching process embedding, 
stochastic approximation via martingale methods, and results on perturbed dy- 
namical systems that extend the stochastic approximation results. Prototypical 
theorems are given in each of these four sections, and pointers are given to later 
sections where further refinements arise. 

2.1. Some basic models 

The basic building block for reinforced processes is the urn modeH. A (single- 
urn) urn model has an urn containing a number of balls of different types. The set 
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of types may be finite or, in the more general models, countably or uncountably 
infinite; the types are often taken to be colors, for ease of visualization. The 
number of balls of each type may be a nonnegative integer or, in the more 
general models, a nonnegative real number. 

At each time n = 1,2,3,... a ball is drawn from the urn and its type noted. 
The contents of the urn are then altered, depending on the type that was drawn. 
In the most straightforward models, the probability of choosing a ball of a given 
type is equal to the proportion of that type in the urn, but in more general 
models this may be replaced by a different assumption, perhaps in a way that 
depends on the time or some aspect of the past, there may be more than one 
ball drawn, there may be immigration of new types, and so forth. 

In this section, the discussion is limited to generalized Polya urn models, in 
which a single ball is drawn each time uniformly from the contents of the urn. 
Sections [3] and |4] review a variety of more general single-urn models. The most 
general discrete-time models considered in the survey have multiple urns that 
interact with each other. The simplest among these are mean-field models, in 
which an urn interacts equally with all other urns, while the more complex have 
either a spatial structure that governs the interactions or a stochastically evolv- 
ing interaction structure. Some applications of these more complex models are 
discussed in Section [46l We now define the processes discussed in this section. 

Some notation in effect throughout this survey is as follows. Let (ri,jF, P) be 
a probability space on which are defined countable many IID random variables 
uniform on [0, 1]. This is all the randomness we will need. Denote these random 
variables by {Unk : n, fc > 1} and let J^n denote the a-field a{Umk : m < n) that 
they generate. The variables {Unk}k>i are the sources of randomness used to 
go from step n — 1 to step n and JF„ is the information up to time n. In this 
section we will need only one uniform random variable Un at each time n, so 
we let Un denote C/„i. A notation that will be used throughout is 1a to denote 
the indicator function of the event A, that is, 



1 ii uj e A 
if w ^ A 



Vectors will be typeset in boldface, with their coordinates denoted by cor- 
responding lightface subscripted variables; for example, a random sequence 
of d-dimensional vectors {X„ : n = 1,2,...} may be written out as Xi := 
{Xii, . . . , Xid) and so forth. Expectations E(-) always refer to the measure P. 



Polya' s urn 



The original Polya urn model which first appeared in EP23t P6131 1 has an urn 
that begins with one red ball and one black ball. At each time step, a ball is 
chosen at random and put back in the urn along with one extra ball of the color 
drawn, this process being repeated ad infinitum. We construct this recursively: 
let Rq — a and Bq = b for some constants a, 6 > 0; for n > 1, let i?„+i = 
Rn + lc/„+i<x„ and Bn+i = Bn + lt/„+i>x„, where X„ := Rn/{Rn + Bn). We 
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interpret i?„ as the number of red balls in the urn at time n and i?„ as the 
number of black balls at time n. Uniform drawing corresponds to drawing a red 
ball with probability X„ independent of the past; this probability is generated 
by our source of randomness via the random variable J7„+i, with the event 
{Un+i < Xn} being the event of drawing a red ball at step n. 

This model was introduced by Polya to model, among other things, the spread 
of infectious disease. The following is the main result concerning this model. The 
best known proofs, whose origins are not certain [Fre65l : IBK64| . are discussed 
below. 

Theorem 2.1. The random variables Xn converge almost surely to a limit X . 
The distribution of X is P{a,b), that is, it has density Ca;"^^(l — x)''^^ where 

C — -pf \^/}\ ■ particular, when a — b — 1 ( the case in \EP2.i l). the limit 
i [aji [o) 

variable X is uniform on [0,1]. 

The remarkable property of Polya's urn is that is has a random limit. Those 
outside of the field of probability often require a lengthy explanation in order to 
understand this. The phenomenon has been rediscovered by researchers in many 
fields and given many names such as "lock-in" (chiefly in economic models) and 
"self organization" (physical models and automata). 



Generalized Polya urns 

Let us generalize Polya's urn in several quite natural ways. Take the number 
of colors to be any integer k > 2. The number of balls of color j at time n 
will be denoted Rnj- Secondly, fix real numbers {Aij : I < i,j < k} satisfying 
Aij > —Sij where Sij is the Kronecker delta function. When a ball of color i is 
drawn, it is replaced in the urn along with Aij balls of color j for 1 < j < k. 
The reason to allow An € [—1,0] is that we may think of not replacing (or 
not entirely replacing) the ball that is drawn. Formally, the evolution of the 
vector R„ is defined by letting X„ := Rri/X]j=i-^™i ^^'^ setting Rn+ij = 
Rnj + Aij for the unique i with J^tKi-^^t Un+i < X]t<i^nt- This guarantees 
that Rn+i^j = Rnj + Aij for all j with probability Xni for each i. A further 
generalization is to let {Yn} be IID random matrices with mean A and to take 
Rn+i,j = Rnj + {Yn)tj whcrc again i satisfies I]t<i-^n* < Un+i < J2t<i^"t- 

I will use the term generalized Polya urn scheme (GPU) to refer to the 
model where the reinforcement is Aij and the term GPU with random incre- 
ments when the reinforcement {Yn)ij involves further randomization. Greater 
generalizations are possible; see the discussion of time-inhomogene ity in Sec- 



tion [321 Various older urn models, such as the Ehrenfest urn model EEOTI ] can 
be cast as generalized Polya urn schemes. The earliest variant I know of was 
formulated by Bernard Friedman j Fri49j . In Friedman's urn, there are two col- 
ors; the color drawn is reinforced by a > and the color not drawn is reinforced 
by p. This is a GPU with 

a (3 
(3 a 



A 
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Let Xn denote X„i, the proportion of red balls (b alls of color 1). Friedman ana- 
lyzed three special cases. Later, David Freedman |Fre65| gave a general analysis 
of Friedman's urn when a > f3 > 0. Freedman's first result is as follows (the pa- 
per goes on to find regions of Gaussian and non-Gaussian behavior for (X„ — i)). 

Theorem 2.2 ( FreGSl . Corollaries 3.1, 4.1 and 5.1]). The proportion Xn of red 



balls converges almost surely to ^. 

What is remarkable about Theorem 12.21 is that the proportion of red balls 
does not have a random limit. It strikes many people as counterintuitive, after 
coming to grips with Polya's urn, that reinforcing with, say, 1000 balls of the 
color drawn and 1 of the opposite color should push the ratio eventually to ^ 
rather than to a random limit or to {0, 1} almost surely. The mystery evaporates 
rapidly with some back-of-the- napkin computations, as discussed in section [Z4l 
or with the following observation. 

Consider now a generalized Polya urn with all the Atj strictly positive. The 
expected number of balls of color j added to the urn at time n given the past is 
XniAij. By the Perron- Frobenius theory, there is a unique simple eigenvalue 
whose left unit eigenvector tt has positive coordinates, so it should not after all 
be surprising that X„ converges to tt. The following theorem from to |AK68L 
Equation (33)] will be proved in Section [^751 

Theorem 2.3. In a GPU with all Aij > 0, the vector X„ converges almost 
surely to tt, where ir is the unique positive left eigenvector of A normalized by 

Remark. When some of the Aij vanish, and in particular when the matrix A 
has a nontrivial Jordan block for its Perron-Frobenius eigenvalue, then more 
subtleties arise. We will discuss these in Section [3.11 when we review some results 
of S. Janson. 



Reinforced random walk 
The first reinforced random walk appear ing in the literature was the edge- 



reinforced random walk (ERRW) of |CD87| . This is a stochastic process 
defined as follows. Let G be a locally finite, connected, undirected graph with 
vertex set V and edge set E. Let v ^ w denote the neighbor relation {v,w} G 
E{G). Define a stochastic process Xq, Xi,X2, ■ ■ ■ taking values in V{G) by the 
following transition rule. Let Gn denote the a-field a-{Xi, . . . , Xn)- Let Xq — v 
and for n > 0, let 

P(X„+i = W I Gn) = ^ °"("^'-^") (2.1) 

where a„(a;, y) is one plus the number of previous times the edge {x, y} has been 
traversed (in either direction): 

n-l 

a„(x, y):^l + Y, l{x..x,+i}={x.y} ■ (2.2) 
fe=i 
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Formally, we may construct such a process by ordering the neighbor set of each 
vertex v arbitrarily gi{v), . . . ,gd{v)i'^') a-nd taking Xn+i — gi(Xn) if 

In the case that G is a tree, it is not hard to find multi-color Polya urns 
embedded in the ERRW. For any fixed vertex v, the occupation measures of the 
edges adjacent to v, when sampled at the return times to v, form a Polya urn 
process, {X^'''' : n > 0}. The following lemma from ^PemSSa] begins the analysis 
in Section EH] of ERRW on a tree. 

Lemma 2.4. The urns {xl^'j^jgy^g) are jointly independent. 

The vertex- rei nforced random walk or VRRW, also due to Diaconis 
and introduced in fPem88b'| , is similarly defined except that the edge weights 
an{gt{Xn), Xn) in equation (|2.3p are replaced by the occupation measure at the 
destination vertices: 

n 

an{gt{Xn)) := 1 + E ^X,=g,(X„) ■ (2.4) 
k=l 

For VRRW, for ERRW on a graph with cycles, and for the other variants of 
reinforced random walk that are defined later, there is no representation directly 
as a product of Polya urn processes or even generalized Polya urn processes, but 
one may find embedded urn processes that interact nontrivially. 

We now turn to the various methods of analyzing these processes. These are 
ordered from the least to the most generalizable. 



2. 2. Exchangeability 

There are several ways to see that the sequence {Xn} in the original Polya's 
urn converges almost surely. The prettiest analysis of Polya's urn is based on 
the following lemma. 

Lemma 2.5. The sequence of colors drawn from Polya's urn is exchangeable. 
In other words, letting C„ = 1 if Rn = + l (o^ f&d hall is drawn) and — 
otherwise, then the probability of observing the sequence (Ci — ei, . . . , C'n — f-n) 
depends only on how many zeros and ones there are in the sequence (ei, . . . , e„) 
but not on their order. 

Proof: Let X^iLi denoted by k. One may simply compute the probabili- 
ties: 

□ 
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It follows by de Finetti's Theorem |Fel7lL Section VII. 4] that X„ — > X almost 
surely, and that conditioned on X — p, the {Ci} are distributed as independent 
Bernoulli random variables with mean p. The distribution of the limiting random 
varia ble X stated in t heoreml2.1l is then a consequence of the formula (|2.5p (see, 
e.g., (PelTll . VII.4] or [Dur04l Section4.3b]). 

The method of exchangeability is neither robust nor widely applicable: the 
fact that the sequence of draws is exchangeable appears to be a stroke of luck. 
The method would not merit a separate subsection were it not for two further 
appearances. The first is in the statistical applications in Section [4?2l below. The 
second is in ER RW. This process turns out to be Markov-exchangeable in the 
sense of DF80| | , which allows an explicit analysis and leads to some interesting 



open questions, also discussed in Section [5] below. 



2.3. Embedding 

Embedding in a multitype branching process 

Let {Z(t) :— {Zi{t), . . . , Zk{t))}t>o be a branching process in continuous time 
with k types, and branching mechanism as follows. At all times t, each of the 
X^iLi Ziit) particles independently branches in the time interval (t, t + 
dt] with probability Ui dt. When a particle of type i branches, the collection 
of particles replacing it may be counted according to type, and the law of this 
random integer fc-vector is denoted /x^. For any oi, . . . , > and any /ii, . . . , //^ 
with finite m e an, suc h a process is known to exist and has been constructed in, 
e.g., [INW66I : lAth68l |. We assume henceforth for nondegeneracy that it is not 



possible to get from |Z(i)| > to |Z(t)| = and that it is possible to go from 
\Zt\ = 1 to \Zt\ = n for all sufficiently large n. We will often also assume that 
the states form a single irreducible aperiodic class. 

Let < Ti < T2 < ■ ■ ■ denote the times of successive branching; our as- 
sumptions imply that for all n, r„ < cx) = sup^r™. We examine the pro- 
cess X„ := Z(t„). The evolution of {X„} may be described as follows. Let 
JF„ = ct(Xi, . . . ,X„). Then 



»(X„+i = X„ + V I J^n) = 2^ \^ F,{v - 

^ — is the nro 



The quantity ^ — — is the probability that the next particle to branch 



will be of type i. When a.; — 1 for all i, the type of the next particle to branch is 
distributed proportionally to its representation in the population. Thus, {X„} 
is a GPU with random increments. If we further require Fi to be deterministic, 
namely a point mass at some vector (An, . . . ,Aik), then we have a classical 
GPU. 

The first people to have exploite d this correspondence to prove facts about 



GPU's were Athreya and Karlin in [AK68[. On the level of strong laws, results 
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about Z(t) transfer immediately to results about X„ — Z(t„). Thus, for ex- 
ample, the fact that Z(t)e~^^* converges almost surely to a rand om multiple 
of the Perron-Frobenius eigenvector of the mean matrix A |Ath68l . Theorem 1] 
gives a proof of Theorem 12.31 Distributional results about Z{t) do not transfer 
to distributional results about X„ without some further regularity assumptions; 
see Section [3T] for further discussion. 



Embedding via exponentials 

A special case of the above multitype branching construction yields the classical 
Polya urn. Each particle independently gives birth at rate 1 to a new particle 
of the same color (or equivalently, disappears and gives birth to two particles of 
the original color). This provides yet another means of analysis of the classical 
Polya urn, and new generalizations follow. In particular, the collective birth 
rate of color i may be taken to be a function f{Zi) depending on the number of 
particles of color i (but on no other color) . Sampling at birth times then yields 
the dynamic X„+i = X„ + with probability f{Xni)/ fi-^nj)- Herman 

Rubin was the first to recognize that this dynamic may be de-coupled via the 
above embedding into i ndepend ent exponential processes. His observations were 
published by B. Davis Dav9Cll | and are discussed in Section [32] in connection 



with a generalized urn model. 

To illustrate the versatility of embedding, I include an interesting, if not 
particularly consequential, application. The so-called OK Corral process is a 
shootout in which, at time n, there are Xn good cowboys and y„ bad cowboys. 
Each cowboy is equally likely to land the next successful shot, killing a cowboy on 
the opposite side. Thus the transition probabilities are (Xn+i,Yn+i) = {Xn — 
l,y„) with probability F„/(X„ + Y^) and (X„+i,r„+i) = (X„,r„ - 1) with 
probabihty X„/(X„ + K„). The process stops when (X„,l^) reaches (0,5*) or 
{S, 0) for some integer S* > 0. Of intere st is th e distribution of S, starting from, 
say the state (N^N). It turns out (see KV03| l that the trajectories of the OK 



Corral process are distributed exactly as time-reversals of the Friedman urn 
process in which a = and /3 — 1, that is, a ball is added of the color opposite 



to the color drawn. The correct scaling of S was known to be iV^/^ |WM98l: 



Kin99j . By embedding in a branching process, Kingman and Volkov were able 



to compute the leading term asymptotic for individual probabilities of S* = fc 
with k on the order of N^/^. 



2.4- Martingale methods and stochastic approximation 

Let {X„ : n > 0} be a stochastic process in the euclidean space K" and adapted 
to a filtration {Tn}- Suppose that X„ satisfies 

X„+i — X„ = — (i^(X„) + ^„+i + i?„) , (2.6) 
n 

where F is a vector field on M", E(^„+i \ Tn) — and the remainder terms 
Rn G go to zero and satisfy Yl^=i^~^\R'ri\ < oo almost surely. Such a 
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process is known as a stochastic approximation process after RM5l| : they 



used this to approximate the root of an unknown function in the setting where 
evaluation queries may be made but the answers are noisy. 

Stochastic approximations arise in urn processes for the following reason. The 
probability distributions, Q„, governing the color of the next ball chosen are 
typically defined to depend on the content vector R„ only via its normalization 
X„. If b new balls are added to N existing balls, the resulting increment X„_(-i — 
X„ is exactly ^qi^y (Y„ — X„) where Y„ is the normalized vector of added balls. 
Since b is of constant order and N is of order n, the mean increment is 

E(X„+i - X„ I Tn) = - (F(X„) + O(n-i)) 

where i^(X„) = (Y„ — X„). Defining to be the martingale increment 

X„+i — E(X„+i I JF„) recovers (|2.6p . Various recent analyses have allowed scaling 
such as n~'^ in place of in equation (j2.6p for ^ < 7 < 1, or more generally, 
in place of n~^, any constants 7„ satisfying 



: 00 (2.7) 

n 

and 

E^" < 00 • (2.8) 

n 

These more general schemes do not arise in urn and related reinforcement pro- 
cesses, though some of these processes require the slightly greater generality 
where 7„ is a random variable in Tn with 7„ = 0(l/n) almost surely. Because 
a number of available results are not known to hold under (|2.7p - (|2.8p . the term 
stochastic approximation will be reserved for processes satisfying (|2.6p . 

Stochastic approximations arising from urn models with d colors have the 
property that X„ hes in the simplex A^^^^ {x e (M+)'' : X^^^i Xi = 1}. The 
vector field F maps M^'^ to TA := {x e R'^ : X^j'Li x^ = 0}. In the two-color 
case (d = 2), the X„ take values in [0, 1] and F is a univariate function on [0, 1]. 
We discuss this case now, then in the next subsection take up the geometric 
issues arising when d > 3. 

Lemma 2.6. Let the scalar process {Xn} satisfy (12. 7p - (12. 8p and suppose 
^i^n+i I -^n) < K for some finite K . Suppose F is hounded and F(x) < 
for aQ < X < bo and some d > 0. Then for any [a, 6] C (ao,t'o); with probabil- 
ity 1 the process {X^} visits [a, b] only finitely often. The same holds if F > S 
on (ao,6o)- 

Proof: by symmetry we need only consider the case F < —S on (ag, bo). There 
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is a semi-martingale decomposition Xn — Tn + Zn where 

n 



k=l 

and 



k=l 

are respectively the predictable and martingale parts of Xn- Square summabil- 
ity of the scaling constants (|2.8p implies that Z„ converges almost surely. By 
assumption, converges almost surely. Thus there is an almost surely 

finite N{u}) with 

\Zn + Rn - {Zoo - Roo)\ < ^ min{a - gq, bo - b} 

for all n > N. No segment of the trajectory of {Xjv+fc} can increase by more 
than i min{a — ao,bo — b} while staying inside [oq, bo]. When N is sufficiently 
large, the trajectory {X^v+fc} may not jump from [a, b] to the right of bo nor 
from the left of ap to [0,6]. The lemma then follows from the observation that 
for n > N, the trajectory if started in [a, b] must exit [{a + ao)/2, b] to the left 
and may then never return to [a, b]. □ 

Corollary 2.7. If F is continuous then Xn converges almost surely to the zero 
set of F. 

Proof: consider the sub-intervals [a,b] of intervals (ao,6o) on which F > 5 01 
F < —6. Countably many of these cover the complement of the zero set of F 
and each is almost surely excluded fro m the lim it set of {^n}. □ 
This generalizes a result proved by HLS80| |. They generalized Polya's urn 



so that the probability of drawing a red ball was not the proportion Xn of red 
balls in the urn but f{Xn) for some prescribed /. This leads to a stochastic 
approximation process with F{x) = f{x) — x. They also derived convergence 
results for discontinuous F (the arguments for the continuous case work unless 
points where F oscillates in sign are dense in an interval) and showed 

Theorem 2.8 |' [HLS801 Theorem 4.1]). Suppose th ere is a point p and an e > 



withF{p) =Q,F >{) on {p~e,p) and F < on {p,p+e). T/ien P(X„ p) > 0. 
Similarly, if F < on (0, e) or F > on {I ~ e,l), then there is a positive 
probability of convergence to or 1 respectively. 

Proof, if F is continuous: Suppose < p < 1 satisfies the hypotheses of the 
theorem. By Corollarv l2.7[ X„ converges to the union of {p} and {p — e,p + e)'^. 
On the other hand, the semi-martingale decomposition shows that if Xn is in 
a smaller neighborhood of p and N is sufficiently large, then {Xn+k} cannot 
escape {p — e,p + e). The cases p = and p = 1 are similar. □ 
It is typically possible to find more martingales, special to the problem at 
hand, that help to prove such things. For the Friedman urn, in the case a > 3/3, 
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it is shown in |Fre65l . Theorem 3.1] that the quantity Yn := C„(i?„ — B„) is a 
martingale when {C„} are constants asymptotic to n^'' for p :— {a~ /])/ {a + f3). 
Similar computations for higher moments show that liminfy„ > 0, whence 
Rn-Bn^Q{nP). 

Much recent effort has been spent obtaining some kind of general hypotheses 
under which convergence can be shown not to occur at points from which the 
process is being "pushed away" . Intuitively, it is the noise of the process that 
prevents it from settling down at an unstable zero of F, but it is difficult to find 
the right conditions on the noise and connect them rigorously to destabilization 
of unstable equilibria. The proper context for a full discussion of this is the next 
subsection, in which the geometry of vector flows and their stochastic analogues 
is discussed, but we close here with a one-dimensional result that underlies 
m any of the multi-di mensional results. The result was proved in various forms 



Pem88bl : lPem90a |. 



Theorem 2.9 (nonconvergence to unstable equilibria). Suppose {X„} satisfies 
the stochastic approximation equation p.6p and that for some p £ (0,1) and 
e > 0, sgni^(x) = sgn(a: — p) for all x ^ {p — e,p + e). Suppose further that 
E(^+ I ^n) and¥,{^^ \ Tn) are hounded above and below by positive numbers when 
Xn e (p- e,p + e). Then P(X„ -^p)=0. 

Proof: 

Step 1: it suffices to show that there is an e > such that for every n, V{Xk 
p\^n) < 1 — e almost surely. Proof: A standard fact is that F{Xk ^ p\J^n) 1 
almost surely on the event {X^ p} (this holds for any event A in place of 
{Xk — > p})- In particular, if F{Xk p) = a > then for any e > there is 
some n such that ¥{Xk p \ Tn) > 1 — e on a set of measure at least a/2. Thus 
P{Xk ^ 0) > is incompatible with F{Xk p \ Tn) < 1 — e almost surely for 
every n. 

Step 2: with probability e, given Xn+k may wander away fromp by cn~^/'^ 
due to noise. Proof: Let r be the exit time of the interval {p — cn~^/'^,p + 
cn'^l"^). Then E(Xt- — p)^ < c^n^^. On the other hand, the quadratic variation 
of {{Xnh,T — p)^} increases by 9(n~^) at each step, so on {r = oo} is 8(n~^). 
If c is small enough, then we see that the event {r = oo} must fail at least e of 
the time. 

Step 3: with probability e, Xr^^k may then fail to return to (j> — cn~^/'^ /2,p + 
cn~^/^/2), due to the drift overcoming the noise. Proof: Suppose without loss of 
generality that X^- < p— cn^^^"^ . The quadratic variation of the supermartingale 
{Xr+k} is 0(r~^), hence 0{n~^). The probability of such a supermartingale 

increasing by cn~^/^/2 is bounded away from 1. □ 

As an example, apply this to the urn process in choosing the urn 



function to be given by f{x) = ?>x^ — 2x^ . This corresponds to choosing the color 
of each draw to be the majority out of three draws sampled with replacement. 
Here, it may easily be seen that < on (0, \) and > on (i, 1). Verifying 
the hypotheses on ^, we find that convergence to ^ is impossible, so Sn 
or 1 almost surely. 
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2.5. Dynamical systems and their stochastic counterparts 

In a vein of research spanning the 1990's and continuing through the present, 
Benai'm and collaborators have formulated an approach to stochastic approxi- 
mations based on notions of stability for the approximating ODE. This section 
describes the dy namica l system approach. Much of the material here is taken 
from the survey jBen99j . 



The dynamical system heuristic 

For processes in any dimension obeying the stochastic approximation equa- 
tion (j2.6p there are two natural heuristics. Sending the noise and remainder 
terms to zero yields a difference equation X„_|_i — X„ = FCX-n) and ap- 
proximating X]fc=i by the continuous variable logt yields the differential 
equation 

f = f (X) - (2.9) 

The first heuristic is that trajectories of the stochastic approximation {X„} 
should approximate trajectories of the ODE {X(i)}. The second is that stable 
trajectories of the ODE should show up in the stochastic system, but unstable 
trajectories should not. 

A complicating factor in the analysis is the possibility that the trajectories of 
the ODE are themselves difficult to understand or classify. A standard battery 
of examples from the dynamical systems literature shows that, once the di- 
mension is greater than one, complicated geometry may arise such as spiraling 
toward cyclic orbits, orbit chains punctuated by fixed points, and even chaotic 
trajectories. Successful analysis, therefore, must have several components. First, 
definitions and results are required in order to understand the forward trajec- 
tories of dynamical systems; see the notions of w-limit sets (forward limit sets) 
and attractors, below. Next, the notion of trajectory must be generalized to take 
into account perturbation; see the notions of chain recurrence and chain transi- 
tivity below. These topological notions must be further generalized to allow for 
the kind of perturbation created by stochastic approximation dynamics; see the 
notion of asymptotic pseudotrajectory below. Finally, with the right definitions 
in hand, one may prove that a stochastic approximation process {X„} docs in 
fact behave as an asymptotic pseudotrajectory, and one may establish, under 
the appropriate hypotheses, versions of the stability heuristic. 

It should be noted that an early body of literature exists in which simplify- 
ing assumptions preclude flows with the worst geometries. The most common 
simplifying assumption is that F = —W for some function V , which we think 
of as a potential. In this case, all trajectories of X(t) lead "downhill" to the set 
of local minima of V. From the viewpoint of stochastic processes obeying (|2.6p 
that arise in reinforcement models, the assumption F = —'S/V is quite strong. 
Recall, however, that the original stochastic approxima tion processe s were de- 
signed to locate points such as constrained minima in which 
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case F is the negativ e gradient of the objective function. Thus, as pointed 
out in |BH95l : iBenggj . much of the early work on stochastic approximation 
processes focused exclusively on geometrically simple cases such as gradient 
flow ^KC78, : j3MP90j or attraction to a point (AEK83j . Stochastic approxima- 
tion processes in the absence of Lyapunov functions can and do follow limit 
cycles; the earliest natural example I know is found in Ben97|. 



Topological notions 

Although all our flows come from differential equations on real manifolds, many 
of the key notions are purely topological. A flow on a topological space M is 
a continuous map {t,x) i— > $t(x) from M. x M to M such that ^o{x) = x and 
^s+tix) = $t(<i>s(x)) (note that negative times are allowed). The relation to 
ordinary differential equations is that any bounded Lipschitz vector field F on 
K" has unique integral curves and therefore defines a unique flow $ for which 
{d/dt)^t{x) = F{^t{x)); we call this the flow associated to F. We will assume 
hereafter that M is compact, our chief example being the d-simplex in R'^+^. The 
following constructions and results are due mostly to Bowen and Conley and are 
taken from Conley 's CBMS lecture notes |Con78| . The notions of forward (and 
backward) limit sets and attractors (and repellers) are old and well known. 
For any set Y C define the forward limit set by 

u;{Y):^f][j^). (2.10) 

t>0 s>t 

When Y = {y}, this is the set of limit points of the forward trajectory form y. 
Limit sets for sample trajectories will be defined in (|2.11[) below; a key result will 
be to relate these to the forward limit sets of the corresponding flow. Reversing 
time in (|2.10p . the backward limit set is denoted a{Y). 

An attractor is a set A that has a neighborhood U such that uj{U) = A. 
A repeller is the time-reversal of this, replacing u!{U) by a{U). The set Aq of 
rest points is the set {x G M : ^t{x) — x for all t}. 

Conley then defines the chain relation on M, denoted Say that x ^ y if 
for alH > and all open covers U of A/, there is a sequence x = zq, zi, . . . , 
Zn = y oi some length n and numbers ti, . . . ,tn > t such that <i>f . (^i-i) and Zi 
are both in some U G U. In the metric case, this is easier to parse: one must 
be able to get from a; to y by a sequence of arbitrarily long flows separated 
by arbitrarily small jumps. The chain recurrent set R ~ R{M, $) is defined 
to be the set {x G M : x ^ x}. The set i? is a compact set containing all 
rest points of the flow (points x such that $t(x) = x for all t), all closures 
of periodic orbits, and in general all forward and backward limit sets Lo{y) and 
a(y) of trajectories. 

An invariant set S (a union of trajectories) is called (internally) chain 
recurrent if a; ^5 a; for all x G S, where ^5 denotes the flow restricted to 
iS*. It is called (internally) chain transitive if x -^5 y for all x,y G S. The 
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following equivalence from Bow75l | helps to keep straight the relations between 
these definitions. 

Proposition 2.10 ( [Ben99l . Proposition 5.3]). The following are equivalent con- 
ditions on a set S C M . 

1. S is chain transitive; 

2. S is chain recurrent and connected; 

3. S is a closed invariant set and the flow restricted to S has no attractor 
other than S itself. 

□ 

Example 2.1. Consider the flow on the circle shown on the left-hand side of 
figure [T] It moves strictly clockwise except at two rest points, a and b. Allowing 
small errors, one need not become stuck at the rest points. The flow is chain 
recurrent and the only attractor is the whole space. Reversing the flow on the 
western meridian results in the right-hand figure. Now the point a is a repeller, 
b is an attractor, the height is a strongly gradient-like function, and the chain 
recurrent set is {a,b}. 



Fig 1. Two flows on 



As we have seen, the geometry is greatly simplified when F = —W. Although 
this requires differential structure, there is a topological notion that captures 
the essence. Say that a flow {<I>t} is gradient-like if there is a continuous 
real function V : M ^ M s uch that V is strictly decreasing along non-constant 
trajectories. Equation (f ) of ConTSl . 1.5] shows that being gradient-like is strictly 
weaker than being topologically equivalent to an actual gradient. If in addition, 
the set R is totally disconnected (hence equal to the set of rest points), then the 
flow is called strongly gradient- like. 

Chain recurrence and gradient-like behavior are in some sense the only two 
possible phenomena. In a gradient-like flow, one can only flow downward. In a 
chain-recurrent flow, any function weakly decreasing on orbits must in fact be 
constant on components. Although we will not need the following result, it does 
help to increase understanding. 
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Theorem 2.11 ( ConTSl page 17]). Every flow on a compact space M is uniquely 



represented as the extension of a chain recurrent flow by a strongly gradient flow. 
That is, there is a unique sub flow ( the flow restricted to R) which is chain re- 
current and for which the quotient flow (collapsing components of R to a point) 
is strongly gradient-like. □ 



Probabilistic analysis 



An important notion, introduced by Benai'm and Hirsch [BH96l | , is the asymp- 
totic pseudotrajectory. A metric is used in the definition, although it is 
pointed out in [BLRol, page 13-14] that the property depends only on the 
topology, not the metric. 

Definition 2.12 (asymptotic pseudotrajectories). Let {t,x) i— > be a flow 

on a metric space M . For a continuous trajectory X : — > M , let 

d^,t,T{X)-= sup d{X{t + h),^h{X{t))) 

0<h<T 

denote the greatest divergence over the time interval [t,t-\-T] between X and the 
flow $ started from X{t). The trajectory X is an asymptotic pseudotrajectory 
for $ if 



lim d^^t.riX) = 



for all T > 0. 



This definition is important because it generalizes the relation so that 
divergence from the flow need not occur at discrete points separated by large 
times but may occur continuously as long as the divergence remains small over 
arbitrarily large intervals. This definition also serves as the intermediary between 
stochastic approximations an d chain transitive sets, as shown by the next two 
results. T he first is proved in Ben99l . Proposition 4.4 and Remark 4.5] and the 



second in lBen99l Theorem 5.7 



Theorem 2.13 (stochastic approximations are asymptotic pseudotrajectories). 
Let {X„} be a stochastic approximation process, that is, a process satisfying (j2.6p , 
and assume F is Lipschitz. Let {X(t) ;= X„ + (t — n)(X„+i — X„) for n < 
t < n -\- 1} linearly interpolate X at nonintegral times. Assume bounded noise: 
\tn\ < K. Then {X(t)} is almost surely an asymptotic pseudotrajectory for the 
flow $ of integral curves of F. □ 



Remark. With deterministic step sizes as in (|2.6p one may weaken the bounded 
noise assumption to L^-boundedness: E|^„p < K; the stronger assumption is 
needed only under (|2.7p - (|2.8p . The purpose of the Lipschitz assumption on F 
is to ensure (along with the standing compactness assumption on M) that the 
flow <I> is well defined. 

The limit set of a trajectory is defined similarly to a forward limit set for 
a flow. If X ; M+ ^ Af is a trajectory, or X : Z+ ^ M is a discrete time 
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trajectory, define 

L{X) := f|X([t,oo)). (2.11) 

t>o 

Theorem 2.14 (asymptotic pseudotrajectories have chain-transitive hmits). 
The limit set L{X) of any asymptotic pseudotrajectory, X, is chain transitive. 
□ 

Combining Theorems 12.131 and I2.14[ and dra wing on Proposition 12.101 yields 
a frequently used basic result, appearing first in 



Corollary 2.15. Let X :— {X„} be a stochastic approximation process with 
bounded noise, whose mean vector field F is Lipschitz. Then with probability 1, 
the limit set L{X) is chain transitive. In view of Provosition [KJd[ it is therefore 
invariant, connected, and contains no proper attractor. □ 



Continuing Example 12.11 the right-hand flow has three connected, closed 
invariant sets S^, {a} and {b}. The flow restricted to either {a} or {b} is chain 
transitive, so either is a possible limit set for {X„}, but the whole set is not 
chain transitive, thus may not be the limit set of {X„}. We expect to rule out 
the repeller {a} as well, but it is easy to fabricate a stochastic approximation 
that is rigged to converge to {a} with positive probability. Further hypotheses 
on the noise are required to rule out {a} as a limit point. For the left-hand flow, 
any of the three invariant sets is possible as a limit set. 

Examples such as these show that the approximation heuristic, while useful, is 
somewhat weak without the stability heuristic. Turning to the stability h euristic , 
one finds better results for convergence than nonconvergence. From BenQOl 
Theorem 7.3], we have: 

Theorem 2.16 (convergence to an attractor). Let A be an attractor for the flow 
associated to the Lipschitz vector field F, the mean vector field for a stochastic 
approximation X :— {X„}. Then either (i) there is at for which {Xt+s : s > 0} 
almost surely avoids some neighborhood of A or (ii) there is a positive probability 
that L{X) C A . 

Proof: A geometric fact requiring no probability is that asymptotic pseudo- 
trajectories get sucked into attractors. Specifically, let be a compact neigh- 
borhood of the attractor A for wh ich oj{K) = A (these exist, by definition of an 
attractor). It is shown in Ben99l . Lemma 6.8] that there are T,5 > Q such that 



for any trajectory X starting in K , d(S,^t,T{X) < 6 for all t implies L{X) C A. 

Fix such a neighborhood K oi A and fix T, 6 as above. By hypothesis, for 
any t > we may find Xt G if with positive probability. Theorem 12.131 mav be 
strengthened to yield a t such that 

F{d^^t,T{X) <6\Tt)>l/2 



on the event Xt G if. If P(Xt e K) = then conclusion (i) of the theorem is 
true, while if P(Xt e if ) > 0, then conclusion {ii) is true. □ 
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For th e nonco nvergence heuristic, most known resuhs (an exception may be 



found in [Pem9l|) are proved under linear instability. This is a stronger hy- 
pothesis than topological instability, requiring that at least one eigenvalue of dF 
ha ve stric tly positive real part. An exact formulation may be found in Section 9 



of [Ben99| . It is important to that linear instability is defined there for periodic 



orbits as well as rest points, thus yielding conclusions about nonconvergence to 
entire orbits, a feature notably lacking in Pem90a]. 

Theorem 2.17 f [Ben99l Theorem 9.1]). Let {X„} be a stochastic approxima- 
tion process on a compact manifold M with bounded noise ||^„|| < K for all n 
and vector field F. Let T be a linearly unstable equilibrium or periodic orbit 
for the flow induced by F. Then 

P( lim d(X„,F) = 0) = 0. 

n — ^oo 

Proof: The method of proof is to construct a function F for which F(X„) obeys 
the hypotheses of Theorem 12.91 This relies on known straightening results for 
stable manifolds a nd is ca rried out in Pem90a for F — {p} and in p3H95j for 



general F; see also Bra98| . □ 



Infinite dimensional spaces 

The stochastic approximation processes discussed up to this point obey equa- 
tion (|2.6p which presumes the ambient space M.'^. In Section [67T] we will consider 
a stochastic approximation on the space V{M) of probability measures on a 
compact manifold M. The space V{M) is compact in the weak topology and 
metrizable, hence the topological definitions of limits, attractors and chain tran- 
sitive sets are still valid and Theorem 12.141 is still available to force asymptotic 
pseudotrajectories to have limit se ts that are chain transitive. In fact this jus- 
tifies the space devoted in |Ben99t and its predecessors to establishing results 



that applied to more than just M . The place where new proofs are required is 
in proving versions of Theorem 12.131 for processes in infinite-dimensional spaces 
(see Theorem 16.41 below). 



Lyapunov functions 

A Lyapunov function for a flow $ with respect to the compact invariant set A is 
defined to be a continuous function V : M ^ M. that is constant on trajectories 
in A and strictly decreasing on trajectories not in A. When A = Ag, the set 
of rest points, existence of a Lyapunov function is equivalent to the flow being 
gradient-like. The values V{Ao) of a Lyapunov function at rest points are called 
critical values. Gradient-like flows are ge ometrically much better behaved than 



more general flows, as is shown in |Ben99l Proposition 6.4, and Corollary 6.6]: 



Proposition 2.18 (chain transitive sets when there is a Lyapunov function). 
Suppose V is a Lyapunov function for a set A such that the set of values V{A) 
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has empty interior. Then every chain transitive set L is contained in A is a 
set of constancy for V. In particular, if A = Aq and intersects the limit set of 
an asymptotic pseudotrajectory {X(t)} in at most countably many points, then 
X(t) must converge to one of these points. □ 

It follows that the presence of a Lyapunov function for the vector flow asso- 
ciated to F implies convergence of {Xj} to a set of constancy for the Lyapunov 
function. For example, Corollary 12.71 may be proved by constructing a Lya- 
punov function with A = the zero set of F. A usual first step in the analysis of a 
stochastic approximation is therefore to determine whether there is a Lyapunov 
function. When F = — of course V itself is a Lyapunov function with A = 
the set of critical points of V. 



3. Urn models: theory 



3.1. Time-homogeneous generalized Pdlya urns 



Recall from Section 12.11 the definition of a generalized Polya urn with rein- 
forcement matrix A. We saw in Section 12.31 that the resulting urn process 
{X„} may be realized as a multitype branching process {Z(r)} sampled at 
its jump times r„. Already in 1965, for the special case of the Friedman urn 

with A ( ) , D. Freedman was able to prove the following limit laws 



P a 

via martingale analysis. 

Theorem 3.1. Let p -.^ (a - f3)/{a + (3). Then 

(i) If p > 1/2 then n^''{Rn — Bn) converges almost surely to a nontrivial 
random variable; 

(ii) If p — 1/2 then {n\ogn)^^/'^[Rn — Bn) converges in distribution to a 

normal with mean zero and variance (a — /3)^; 
(Hi) 7/0 7^ p < 1/2 then n-^/^{Rn - S„) converges in distribution to a normal 
with mean zero and variance (a — /3)^/(l — 2p). 

Arguments for these results will be given shortly by means of embedding in 
branching processes. Freedman's original proof of (Hi) was via moments, esti- 
mating each moment by means o f an asy mptotic recursion; a readable sketch 
of this argument may be found in MahOSl . Section 6] . The present section sum- 



marizes further results that have been obtained via the embedding technique 
described in Section 12.31 Such an approach rests on an analysis of limit laws in 
multitype branching processes. These are of independent interest and yet it is 
interesting to note that such results were not pre-existing. The development of 
limit laws for multitype branching process was mo tivat ed in pa rt by applications 
to urn processes. In particular, the studies Ath68 | and Jaii04l of multitvpe limit 



laws were motivated respectivel y by the comp anion paper [AK68l | on urn models 
and by applications to urns in |,Tan04l : l,Tan05i . 



The first thorough study of CPU's via embedding was undertaken by Athreya 
and Karlin. Although they allow reinforcements to be random, subject to the 
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condition of finite variance, their results depend only on the mean matrix, again 
denoted A. They make an irreducibility assumption, namely that exp(iA) has 
positive entries. This streamlines the analysis. While it does not lose too much 
generality, it probably caused some interesting phenomena in the complemen- 
tary case to remain hidden for another several decades. 

The assumptions imply, by the Perron-Frobenius theory, that the leading 
eigenvalue of A is real and has multiplicity 1, and that we may write all the 
eigenvalues as 

Ai >Re{A2}>--->Re{Ad}. 

If we do not allow balls to be subtracted and we rule out the trivial case of no 
reinforcement, then Ai > 0. For any right eigenvector ^ with e igenvalue A, the 
quantity ^ ■ Z{t)e~^* is easily seen to be a martingale AK681 Proposition 1]. 



When Re {A} > Ai/2, this martingale is square integrable, leading to an almost 
sure limit. This recovers Freedman's first result in two steps. First, taking ^ = 
(1, 1) and X — Xi — a + we see that i?„ + _B„ ~ We^"-^^^^ for some random 
W > Q. Secondly, taking ^ = (1, -1) and A = a - /?, we see that Rn — ~ 
y^r'g(a-f3)t^ with the assumption p > 1/2 being exactly what is needed square 
integrability. These two almost sure limit laws imply Freedman's result (?) above. 

The analogue of Freedman's result {Hi) is that for any eigenvector ^ whose 
eigenvalue A has Re {A} < Ai/2, the quantity ^ • X^/a/v • Xn converges to a 
normal distribution. The greater generality sheds some light on the reason for 
the phase transition in the Friedman model at p = 1/2. For small p, the mean 
drift of R„ — Bn = u • X„ is swamped by the noise coming from the large number 
of particles v • X„ = i?„ + i?„. For large p, early fluctuations in i?„ — _B„ persist 
because their mean evolution is of greater magnitude than the noise. 

A distributional limit for {X„ — Z(t„)} does no t follow automatically from 



the limit law for Z(t). A chief contribution of [AK68l | is to carry out the necessary 
estimates to bridge this gap. 



Theorem 3.2 ( AK68I . Theorem 3]). Assume finite variances and irreducibility 



of the reinforcements. If ^ is a right eigenvector of A whose eigenvalue A satisfies 
Re {A} < Ai/2 then ^ ■ 'Kn/y/v ■ X„ converges to a normal distribution. □ 

Athreya and Karlin also state that a similar result may be obtained in the 
"log" case Re {A} = Ai/2, extending Freedman's result (ii), but they do not 
provide details. 

At some point, perhaps not until the 1990's, it was noticed that there are 
interesting cases of CPU's not covered by the analyses of Athreya and Karlin. 
In particular, the diagonal entries of A may be between —1 and 0, or enough 
of the off-diagonal entries may vanish that exp(tA) has some vanishing entries; 
essentially the only way this can happen is when the urn is triangular, meaning 
that in some ordering of the colors, Aij = for i > j. 

The special case of balanced urns, meaning that the row sums of A are con- 
stant, is somewhat easier to analyze combinatorially because the total number of 
balls in the urn increases by a constant each time. Even when the reinforcement 
is random with mean matrix A, the assumption of balance simplifies the analy- 
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sis. Under the assumption of balance and tenability (that is, it is not possible 
for one of the populati ons to b ecome n egati ve), a nu mber of a nalyses have b een 
undertaken, including |BP85t . [Smv9d | and |Mah03| : see also IMS92|: IMSoBI for 
applications of two-color balanced urns to random recursive trees, and MahQSj 
for a tree application of a three-color balanced urn. Exact solutions to two-color 
ba lanced u rns exhibit involve number theoretic phenomena which are described 
in [FGP05| . 



With out the assumption of balance, results on triangular urns date back at 
least to DV97[. Their chief results are for two colors, and their method is to 



analyze the simultaneous functiona l equatio ns satisfied by the generating func- 
tions. Kotz, Mahmoud and Robert |KMR00| | concern themselves with removing 



the balance assumption, attacking the special case A 



by combi- 



natorial means. A martingale-based analysis of the cases A 



1 



and 



A 



is hidden in PV9^. The latter case had appeared in various 



places dating back to Ros40l | . the result being as follows. 



Theorem 3.3 (diagonal urn). 
forcement matrix 



Let a > b > and consider a GPU with 



A 



Then i?„/i?^ converges almost surely to a nonzero finite limit, where p := a/b. 

Proof: From branching process theory there are variables W, W with e~'^^Rt 
W and e~^^Bt W. This implies Rt/B^ converges to the random variable 
W/{W')P, which gives convergence of Rn/Bn to the same quantity. □ 
Given the piecemeal approaches to CPU's it is fittin g that more co mprehen- 
sive analyses finally emerged. These are due to Janson [Jan04i |jan05 |. The first 
of these is via the embedding approach. The matrix A may be of any finite 
size, diagonal entries may be as small as —1, and the irrcducibility assump- 
tion is weakened to the largest eigenvalue Ai having_multiplicity 1 and being 
"dominant". This last requirement is removed in Jan05l |. which combines the 
embedding approach with some computations at times r„ via generating func- 
tions, thus bypassing the need for converting distributional limit theorems in 
Z{t) to the stopping times t„. The results, given in terms of projections of A 
onto various subspaces, are somewhat unwieldy to formulate and will not be 
reproduced here. As far as I can tell, Janson's results do subsume pretty much 
everything previously know n. For example, the logarithmic scaling result ap- 
pearing in a crude f orm in PV99l Theorem 2.3] and elsewhere was proved as 
Theorem 1.3 (iv) of JanOSl ]: 



Theorem 3.4. Let i?„ and B„ be the counts of the two colors of balls in a Fried- 
man urn with A — ^ ^\ Then the quantity R„/{cBn) — logi?„ converges 
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almost surely to a random finite limit. Equivalently. 

(logn)^ / n nloglogrt 



\ ^" - -^ (3.1) 

converges to a random finite limit. □ 

To ver ify the equi valence of the two versions of the conclusion, found respec- 
tively in [PV99| and Jan05 |. use the deterministic relation i?„ = i?o + n + (c — 



\){Bn — Bq) to see that convergence of Rn/{cBn) — logi?„ is equivalent to 

Tl 

— ^\ogBn = Z + o{\) (3.2) 

cBn 

for some finite random Z. Also, both versions of the conclusion imply log(n/_B„) = 
loglogn-|-logc + o(l) and loglogn = loglogi3„ -t-o(l). It follows then that 
is equivalent to 



B„ 



n 



c log Bn + cZ 

n ( log_B„ — logn Z + o{l) 



clogn \ logn logn 

n / , log(n/B„) Z + o{l) 



clogn \ logn logn 

n f log logn Z — logc + o(l) 



clogn \ logn logn 
which is equivalent to the convergence of p.ip to the random limit c^^(Z— logc). 

3.2. Some variations on the generalized Poly a urn 

Dependence on time 

The time-dependent urn is a two-color urn, where only the color drawn is 
reinforced; the number of reinforcements added at time n is not independent 
of n but is given by a deterministic s equence o f positive real numbers {a„ : 
n = 0, 1, 2, . . .}. This is introduced in Pem90bl | with a story about modeling 



American primary elections. Denote the contents by i?„, Bn and X„ = Rn/{Rn+ 
Bn) as usual. It is easy to see that Xn is a martingale, and the fact that the 
almost sure limit has no atoms in the open interval (0, 1) may be shown via 
the same three-step nonconvergence argument used to prove Theorem 12.91 The 
question of atoms among the endpoints {0, 1} is more delicate. It turns out there 
is an exact recurrence for the variance of A"„, which leads to a characterization 
of when the almost sure limit is supported on {0, 1}. 



Theorem 3.5 ( [PemQObl . Theorem 2]). Define 6n ■— an/(^o+^o + X]J=o '^j) 
he the ratio of the n*'' increment to the volume of the urn before the increment 
is added. Then lim„^oo Xn — 1 almost surely if and only ifYl^=i ^n, — ^- '-' 
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Note that the almost sure convergence of X„ to {0, 1} is not the same as 
convergence of X„ to {0, 1} with positive probabihty: the latter but not the 
former happens when an — n. It is also not the same as almost surely choosing 
one color only finitely often. No sharp cri terion is known for positive probability 
of lim„_»oo Xn e {0, 1}, but it is known Pem90bl Theorem 4] that this cannot 
happen when sup„ a„ < oo. 

Ordinal dependence 

A related variation adds a„ red balls the n*^ time a red ball is drawn and a'^ 
black balls the n*'' time a black ball is drawn. As is characteristic of such models, 
a seemingly small change in the definition leads to an different behavior, and to 
an entirely different method of analysis. One may in fact generalize so that the 
^th reinforcement of a black ball is of size a' „, not i n general equal to a„. The 
following result appears in the appendix of Dav90l | and is proved by Rubin's 
exponential embedding. 

Theorem 3.6 (Rubin's Theorem). Let Sn ■= J2k=o ^'^^ ^'n ■= Sfc=o '^'n- 
G denote the event that all hut finitely many draws are red, and G' the event 
that all but finitely many draws are black. Then 

ft) IfEZo l/^n = ^ = EZo 1/5,:. then P(G) = P(G') = 0; 
0^) IfEZo l/5n = oo > En=, yS'n then P(G') = 1; 

(i^) ^/Er=o V5n,E^=ol/5; < oo thenF{G),¥{G') > arirf P(G) +P(G') = 
1. 

Proof: Let {F„, F,' : n = 0, 1,2,.. .} be independent exponential with respec- 
tive means l/a„ and l/a'^- We think of the sequence Yi, Yi +1^2, • ■ • as successive 
times of an alarm clock. Let R{t) = sup{n : J2k=o < t} be the number of 
alarms up to time t, and similarly let B{t) = sup{n : X]fc=o — num- 
ber of alarms in the primed variables up to time t. If {t„} are the successive 
jump times of the pair {R{t),B{t)) then {R{Tn) , B (rn)) is a copy of the Davis- 
Rubin urn process. The theorem follows immediately from this representation, 
and from the fact that X]^o finite if and only if its mean is finite (in which 
case "explosion" occurs) and has no atoms when finite. □ 



Altering the draw 



Mahmoud [MahO^ considers an urn model in which each draw consists of k balls 



rather than just one. There are fc -I- 1 possible reinforcements depending on how 
many red balls there are in the sample. This is related to the model of Hill, Lane 
and Sudderth [HLS80] in which one ball is added each time but the probability 
it is red is not X„ but /(X„) for some function / : [0, 1] — > [0, 1]. The end of 
Section 12.41 introduced the example of majority draw: if three balls are drawn 
and the majority is reinforced, then f{x) — + 3x^(1 — x) is the probability 
that a majority of three will be red when the proportion of reds is x. If one 
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samples with replacement in Mahmoud's model and limits the reinforcement to 
a single ball, then one obtains another special case of the model of Hill, Lane 
and Sudderth. 

A common generalization of these models is to define a family of probability 
distributions {Gx : < a; < 1} on pairs {Y, Z) of nonnegative real numbers, and 
to reinforce by a fresh draw from when Xn — x. If Gx puts mass f{x) on 
(1,0) and 1 — fix) on (0, 1 ), this gives the Hill-Lane-Sudderth urn; an identical 
model appears in |AEK83( . If Gx gives probability ('"')a;^ (l — xY to the pair 
(aij, a2j) for < j < fc then this gives Mahmoud's urn with sample size k and 
reinforcement matrix a. 

When Gx are all supported on a bounded set, the model fits in the stochastic 
approximation framework of Section 12.41 For two-color urns, the dimension of 
the space is 1, and the vector field is a scalar field F{x) — iJ-{x) — x where 
is the mean of Gx- As we have already seen, under weak conditions on F, the 
proportion Xn of red balls must converge to a zero of F, with points at which the 
graph of F crosses the x-axis in the downward direction (such as the point 1/2 
in a Friedman urn) occurring as the limit with positive probability and points 
where the graph of F crosses the z-axis in an upward direction (such as the 
point 1/2 in the majority vote model) occurring as the limit with probability 
zero. 

Suppose F is a continuous function and the graph of F touches the x-axis 
at {p, 0) but does not cross it. The question of whether Xn p with positive 
probability is then more delicate. On one side of p, the drift is toward p and on 
the other side of p the drift is away from p. It turns out that convergence can only 
occur if Xn stays on the side where the drift is toward p, and this can only happen 
if the drift is small enough. A curve tangent to the x-axis always yields small 
enough drift that convergence is possible. The phase transition occurs when the 
one-sided derivative of F is —1/2. More specifically, it is shown in Peni9l| that 
(i) if < F{x) < (p — x)/{2 + e) on some neighborhood {p — e, p) then X„ p 
with positive probability, while (ii) if F{x) > (p — x)/{2 — e) on a neighborhood 
(p — e,p) and F{x) > on a neighborhood {p,p + e), then P(X„ ^ p) — 0. The 
proof of (i) consists of establishing a power law p — Xn = il(n^"), precluding 
Xn ever from exc eeding p. 

The paper [AEK83] introduces the same model with an arbitrary finite num- 
ber of colors. When the number of colors is d + 1, the state vector X„ lives 
in the d-simplex A"* := {{xi, . . . , Xd+i & (M+)''+^ ■ = !}• Under rela- 

tively strong conditions, they prove convergence with probability 1 to a global 
attractor. A recent variation by Siegmund and Yakir weakens the hypothesis 
of a global attractor to allow for finitely many non-attracting fixed points on 
/)A'' (SY0,4 Theorem 2.2]. They apply their result to an urn model in which 
balls are labeled by elements of a finite group: balls are drawn two at a time, 
and the result of drawing g and h is to place an extra ball of type g-hin the urn. 
The result is that the contents of the urn converge to the uniform distribution 
on the subgroup generated by the initial contents. 

All of this has been superseded by the stochastic approximation framework of 
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Benai'm et al. While convergence to attractors and nonconvergence to repelling 
sets is now understood, at least in the hyperbolic case (where no eigenvalue of 
dF{p) has vanishing real part), some questions still remain. In particular, the 
estimation of deviation probabilities has not yet been carried out. One may ask, 
for example, how the probability of being at least e away from a global attractor 
at time n decreases with n, or how fast the probability of being within e of a 
repeller at time n decreases with n. These questions appear related to quantita- 
tive estimates on the proximity to which {Xn} shadows the vector flow 



associated to F (cf. the Shadowing Theorem of Benaim and Hirsch [Ben99 . 
Theorem 8.9]). 



4. Urn models: applications 



In this section, the focus is on modeling rather than theory. Most of the examples 
contain no significant new mathematical results, but are chosen for inclusion 
here because they use reinforcement models (mostly urn models) to explain 
and predict physical or behavioral phenomena or to provide quick and robust 
algorithms. 



4.1. Self-organization 

The term self-organization is used for systems which, due to micro-level in- 
teraction rules, attain a level of coordination across space or time. The term 
is applied to models from statistical physics, but we are concerned here with 
self-organization in dynamical models of social networks. Here, self-organization 
usually connotes a coordination which may be a random limit and is not explic- 
itly programmed into the evolution rules. The Polya urn is an example of this: 
the coordination is the approach of Xn to a limit; the limit is random and its 
sample values are not inherent in the reinforcement rule. 



Market share 



One very broad application of Polya-likc urn models is as a simplified but plau- 
sible micro-level mechanism to explain the so-called "lock-in" phenomenon in 
industrial or consumer behavior. The questions are why one technology is cho- 
sen over another (think of the VHS versus Betamax standard for videotape), 
why the locations of industrial sites exhibit clustering behavior, and so forth. In 
a series of articles in the 1980's, Stanford economist W. Brian Arthur proposed 
urn models for this type of social or industrial process, matching data to the 
predictions of some of the models. Arthur used only very simple urn models, 
most of which were not new, but his conclusions evidently resonated with the 
economics community. The stories he associated with the models included the 
following. 

Random limiting market share: Suppose two technologies (say Apple versus 
IBM) are selectively neutral (neither is clearly better) and enter the market 
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at roughly the same time. Suppose that new consumers choose which of the two 
to buy in proportion to the numbers already possessed by previous consumers. 
This is the basic Polya urn model, leading to a random limiting market share: 
Xn — > X. In the case of Apple computers, the sarngle value of X is between 
10% and 15%. This model is discussed at length in [aEK87|. 
Random monopoly: Still assuming no intrinsic advantage, suppose that economies 
of scale lead to future adoption rates proportional to a power a > 1 of present 
market share. This particular one-dimensional GPU is of the type in Theo- 
rem 12.81 (a Hill-Lane-Sudderth urn) with 

F(x) = (4.1) 

The graph of F is shaped as in figure [5] below. The equilibrium at a; = 1/2 is 
unstable and Xn converges almost surely to or 1. Which of these two occurs 
depends on chance fluctuations near the beginning of the run. In fact such qual- 
itative behavior persists even if one of the technologies does have an intrinsic 
advantage, as long as the shape of F remains qualitatively the same. The pos- 
sib ility of a n eventual monopoly by an inferio r technology is discussed as well 
in AEK87 1 and in the popular account Art9(]| | . The particular F of (|4.ip leads 



to interesting quantitative questions as to the time the system can spend in 



disequilibrium, which are discussed in jCLOGbl ; |OS05 | 



0.2-1 



-0.2-' 




Fig 2 . The urn function F for the power law market share model 



Neuron polarity 



The mathematics of the following model for neuron growth is mathematically 
almost identical. The motivating biological question concerns the mechanisms 
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by which apparently identical cells develop into different types. This is poorly 
understood in many important developmental processes. Khanin and Khanin 
examine the development of neurons into two types: axon and dendrite. Indis- 
tinguishable at first, groups of such cells exhibit periods of growth and retraction 
until one rapidly elongates to eventually become an axon [KKOlL page 1] . They 
note experimental data suggesting that any neuron has the potential to be ei- 
ther type, and hypotheses that a neuron's length at various stages of growth 
relative to nearby neurons may influence its development. 

They propose an urn model where at each discrete time one of the existing 
neurons grows by a constant length, I, and the others do not grow. The proba- 
bility of being selected to grow is proportional to the a-power of its length, for 
some parameter a > 0. They give rigorous proofs of the long- term be havior in 
three cases. When a > 1, they quote Rubin's Theorem from [DavQOt to show 



that after a certain random time, only o ne neu ron grows. When a = 1, they cite 
results on the classical Polya urn from Fel68l | to show that the pairwise length 
ratios have random finite limits. When a < 1, they use embedding methods to 
show that every pair of lengths has ratio equal to 1 in the limit and to show 
fluctuations that are Gaussian when a < 1/2, Gaussian with a logarithm in the 
scaling when a = 1/2, and differing by a t" times a random limiting constant 
when a £ (1/2, 1) (cf. Freedman's results quoted in Section [3T|l . 



Preferential attachment 

Another self-organization story has to do with random networks. Models of 
random networks are used to model the internet, trade, political persuasion 
and a host of other phenomena. Mathematically, the best studied model is the 
Erdos-Renyi model where each possible edge is present independently with some 
probability p. For the purposes of many applications, two properties are desirable 
that do not occur in the Erdos-Renyi model. First, empirical studies show that 
the distribution of vertex degrees should follow a power law rather than be 
tightly clustered around its mean. Secondly, there should be local clustering 
but global connectivity, meaning roughly that as the number of vertices goes to 
infinity with the average degree constant, the graph-theoretic distance between 
typical vertices should be small (logarithmic) but the collection of geodesies 
should have bottlenecks at certain "hub" vertices. 

A mo del, kno wn as the small-world model was introduced by Watts and 
Strogatz WS98l | who were interested in the "six degrees of separation" phe- 



nomenon (essentially the empirical fact that the graph of humans and acquain- 
tanceship has local clustering and global connectivity). Their graph is a ran- 
dom perturbation of a nearest neighbor graph. It does exhibit local clustering 
and global connectivity but not the power-law variation of degrees, and is not 
easy to work with. A model with the flexibility to fit an ar bitrary degree pro- 
file was proposed by Chung and Graham and analyzed in [CL03|. This static 
model is flexible, tractable and provides graphs that match data. Neither this 
nor the small-world model, however, provides a micro-level explanation of the 
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formation of the graph. A collection of dynamic growth urn models, known as 
preferential attach ment models, the first of which was introduced by Barabasi 
and Albert \BA9^ . has been developed in order to address this need. 



Let a parameter a G [0, 1] be chosen and construct a growing sequence of 
graphs {G"} on the vertex set {1, . . . , n} as follows. Let Gi be the unique graph 
on one vertex. Given G", let G^^^ be obtained from G" by adding a single 
vertex labeled n + I along with a single edge connecting n + 1 to a random 
vertex Vn € G" . With probability a the new vertex Vn is chosen uniformly from 
{ 1 , . . . , n} , while with probability I — a the probability Vn = v is taken to be 
proportional to the degree of v. 

This procedure always produces a tree. When a = 1, this is a well known 
recursive tree. The other extreme case a = may be regarded as pure prefer- 
ential attachment. A modification is to add some fixed number m of new edges 
each time, choosing each independently according to the procedure in the case 
of m = 1 and handling collisions among these m new edges by some arbitrary 
re-sampling scheme. This procedure produces a directed graph that is not, in 
general, a tree. We denote this random graph by G"''". 

Preferential attachment models, also known as rich get richer models are 
examples of scale- free models The pow er laws they exhibit have been fit to 
data many times, e.g., in figure 1 of [BA99| |. Preferential attachment graphs have 
also been used as th e underlying graphs for models of interacting systems. For 



example, KKO^OSj examines a market pricing model known as the graphical 
Fisher model for price setting. In this model, there is a bipartite graph whose 
vertices are vendors and buyers. Each buyer buys a unit of goods from the 
cheapest neighboring vendor, with the vendors trying to set prices as high as 
possible while still selling all the ir goods. T he emergent prices are entirely a 



function of the graph structure. In IKKO+05I I , the graph is taken to be a bipartite 



version of G"'™ and the prices are shown to vary only when m = 1. 

A number of nonrigorous arguments for the degree profile of G"'™ appear in 
the literature. For example, in Barabasi and Albert's origi nal pap er, the follow- 
ing heuristic argument is given for the case a = 0; see also Mit03| . Consider the 
vertex v added at time k. Let us use an urn model to keep track of its degree. 
There will be a red ball for each edge incident to v and a black ball for each half 
of each edge not incident to v. The urn begins with 2km balls, of which m are 
red. At each time step a total of 2m balls are added. Half of these are always 
colored black (half-edges incident to m new vertices) while half are colored by 
choosing from the urn. Let Ri be the number of red balls in the urn at time I. 
Then 



2lm 21 



and hence 

n-l 



Tl— i j 

Ei?„ = m TT (1 + 1/(2/)) - TOW ^ . 



''see the Wikipedia entry for "scale-free network" 
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Thus far, the urn analysis is rigorous. The heuristic now proposes that the degree 
of each ball is exactly the greatest integer below this. Solving for k so that the 
vertex has degree d at time n gives fc as a function of d: k{d) = rn^n/d^. 
The number of k for which the expected degree is between d and d + 1 is 
\k[d + 1)J — L'*^!^)]; this is roughly the derivative with respect to —d of 
namely 2m?n/d^. Thus the fraction of vertices having degree exactly d should 
be asymptotic to 2m? jd^ . 



Chapter 3 of the forthcoming book of Chung and Lu |CL06al | will contain the 
first rigorous and somewhat comprehensive treatment of preferential attach- 
ment schemes (see the discussion in their Section 3.2 of the perils of unjustified 
heuristics with regard to this model). The only published, rig orous ana lysis of 
preferential attachment that I know of is by BoUobas et al. BRSTOl| and is 



restricted to the case a — Q. BoUobas et al. clean up the definition of G^'™ with 
regard to the initial conditions and the procedure for resolving collisions. They 
then prove the following theorem. 

Theorem 4.1 (degrees in the pure preferential attachment graph). Let 

2m(m + 1) 



/3(m, d) := 



{m + d){m + d+l){m + d + 2) 



and let Xn,m,d denote the proportion among all n vertices of that have 

degree m + d (that is, they have in-degree d when edges are directed toward the 
original vertex). Then both 

mi — — - 

ci<ni/i5 P(m, d) 

and 

■^n.m.d 

^"P Iv 1\ 

d<„i/i5 P{m,d) 

converge to 1 in probability as n —> oo. □ 

As d ^ oo with m fixed, f3{m,d) is asymptotic to 2m^d~'^. This agrees, as 
an asymptotic, with the heuristic for a = 0, while providing more information 
for small d. The method of proof is to use Azuma's inequality on the filtration 
ct(G°'™ : n = 1,2, . . .); once this concentration inequality is established, a rela- 
tively easy computation finishes the proof by showing convergence of EXn^m,d 
to /3(m, d). 



4-2. Statistics 

We saw in Theorem 12 . II that the fraction of red balls in a Polya urn with initial 
composition (i?(0), -8(0)) converges almost surely and that the hmit distribu- 
tion is P{R{0), -B(O)). Because the sequence of draws is exchangeable, de Finetti's 
Theorem allows us to interpret the Polya process as Bayesian observation of a 
coin with unknown bias, p, with a (3{R{0), B{0)) prior on p, the probability of 
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flipping "Red" (see the discussion in Section [2?2|) . Each new flip changes our pos- 
terior on p, the new posterior after n observations being exactly P{R{n), B(n)). 
When i?(0) = B{0) = 1, the prior is uniform on [0,1]. According to |Fel68l 
Chapter V, Section 2], Laplace used this model for a tongue-in-cheek estimate 
that the odds are 1.8 million to one in favor of the sun rising tomorrow; this is 
based on a record of the sun having risen every day in the modern era (about 
5,000 years or 1.8 million days). 



Dirichlet distributions 

The urn representation of the (3 distribution generalizes in the following manner 
to any number of colors. Consider a d-color Polya urn with initial quantities 
i?i(0), . . . , i?d(0). Blackweh and McQueen [ BM7a Theorem 1] showed that the 
limiting distribution is a Dirichlet distribution with parameters {Ri (0), . . . , Rd{0)), 
where the Dirichlet distribution with parameters (ai, . . . , ad) is defined to be 
the measure on the {d — l)-simplex with density 



r(Qi H h ad ' 

r(ai) • • •r(Q!d) . , 



Y[x"' ^ dxi ■ ■ ■ dxd-1 . (4.2) 



The Dirichlet distributi on has important statistical properties, some of which 
we now discuss. Ferguson jFer73l | gives a formula and a discussion of the history. 
It was long known to Bayesians as the conjug ate prio r for the parameters of a 



multinomial distribution (Ferguson refers to [Goo65l | for this fact). Thus, for 
example, the sequence of colors drawn from an urn with initial composition 
(1, . . . , 1) are distributed as flips of a d-sided coin whose probability vector is 
drawn from a prior that is uniform on the (d— l)-simplex; the posterior after n 
flips will be a Dirichlet with parameters {Ri{n), . . . , Rd{n)). 

Given a finite measure a on a space S, the Dirichlet process with reference 
measure a is a random measure ly on S such that for any disjoint sets Ai, . . . , Ad, 
the vector of random measures {v{Ai), . . . , iy{Ad)) has a Dirichlet distribution 
with parameters {a{Ai), . . . , a{Ad)). We denote the law of v by ^{a). Because 
Dirichlet distributions are supported on the unit simplex, the random measure 
V is almost surely a probability measure. 

Ferguson [Fer73j suggests using the Dirichlet process as a natural, uninforma- 
tive prior on the space of probability measures on S. Its chief virtue is the ease 
of computing the posterior: Ferguson shows that after observing independent 
samples xi, . . . ,Xn from an unknown measure v distributed as 'D(a), the poste- 
rior for v is 'D{a + X]fe=i ^i^k)), where d{xk) is a point mass at Xk- A corollary 
of this is a beautiful urn representation for I? (a): it is the limiting contents of an 
S'-colored Polya urn with initial "contents" equal to a. A second virtue of the 
Dirichlet prior is that it is weakly dense in the space of probability measures on 
probability measures on the unit simplex. A drawback is that it is almost surely 
an atomic measure, meaning that it predicts the eventual occurrence of identi- 
cal data values. One might prefer a prior supported on the space of continuous 
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measures, although in this regard, the Dirichlet prior is more attractive than its 
best known predecessor, n amely a random distribution function on [0, 1], defined 
by Dubins and Freedman DF66j . which is almost surely singular-continuous. 

The Dirichlet prior and the urn process representing it has been generalized 
in a number o f ways . A random prior on the sequence space E :— {0, . . . , k—l}°° 
is defined in |Fer74l : iMSWoi ] via an infinite fc-ary tree of urns. Each urn is a 
Polya urn, and the rule for a single update is as follows: sample from the urn 
at the root; if color j is chosen, put an extra ball of color j in that urn, move 
to the urn that is the j*'* child, and repeat this sampling and moving infinitely 
often. Mapping the space E into any other space S gives a prior on S. Taking 
k = 2, S = [0, 1] and the binary m ap (xj) i-^ ^Xj2~^ , one recovers the almost 
surely singular-continuous prior of DF66(. Taking fc = 1, the tree is an infinit e 
ray, and the construction may be used to obtain the Beta-Stacy prior |MS WOC| . 

Another generalization formulates a natural conjugate prior on the the tran- 
sition matrix of a reversible Markov chain. The edge-reinforced random walk, 
defined in Section 12.11 is a Markov-exchangeable process (see the last sentence 
of Section 12. 2p . This implies that the law of this sequence is a mixture of laws 
of Markov chains. Given a set of initial weights one the edges, the mixing 
measur e may be explicitly described, as in Theorem 15.11 below. Diaconis and 
RoUes DR06| propose this family of such measures, with initial weights as pa- 
rameters, as priors over reversible Markov transition matrices. Suppose we fix 
such a prior, coming from initial weights {w{e)} and we then observe a sin- 
gle sample Xq, . . . , X„ of the unknown reversible Markov chain run for time n. 
The posterior distribution will then be another measure from this family, with 
weights 

n-l 

w'{e) := w{e) + ^ l{x,,x,+,}=e ■ 

This is exactly analogous to the Ferguson's use of Dirichlet priors for the param- 
eter of an IID sequence and yields, as far as I know, the only computationally 
feasible Bayesian analysis of an unknown reversible Markov chain. 



The Greenwood- Yule distribution and applications 



Distributions obtained from Polya urn schemes have been proposed for a variety 
of applications in which t he urn m echanism is plausible at the micro-level. For 
example, it is proposed in |jan82j that the number of males born in a family of 
a specified size n might fit the distribution of a Polya urn at time n better than 
a binomial {n,p) if the propensity of having a mal e was not a constant p but 
varied according to family. Mackerro and Lawson ML82| | make a similar case 
(with more convincing data) about the number of days in a gi ven seas on that 



Coh76t . 



are suitable for crop spraying. For more amusing examples, see 

Consider a Polya urn started with R red balls and n black balls and run to 
time an. The probability that no new balls get added during this time is equal 



Robin Pemantle/ Random processes with reinforcement 



32 



n 



to 

n + j 
. n n + R + j 

3=0 

which converges as rt — *■ oo to (1 + a)^^. The probabihty of adding exactly k 
balls during this time converges as well. To identify the limit, use exchangeability 
to see that this is ("^") times the probability of choosing zero red balls in an — k 
steps and then k red balls in a row. Thus the probability Pan{k) of choosing 
exactly k red balls is given by 

(n\ R R + k-l 

Panik) ^ ]Pan-k[0)- 



R+{l + an)-k R+{l + a)n~l 



The limiting distribution 

p{k) = (1 + a) 



{l + a)''kl 



is a distribution with very fat tails known as the Greenwood- Yule distribu- 
tion (also, sometimes, the Eggenberger-Polya distribution). Successive ratios 
p{k + l)/p{k) are of the form c-^^, which may be contrasted to the successive 
ratios c-^ of the Poisson. Thus it is typically used in models where one oc- 
currence may increase the propensity for the next occurrence. It is of historical 
interest because its use in modeling dependent events precedes the paper |EP23t 



of Po lya's by several years: the distribution was introduced by Greenwood and 



Yule [GY20| | in order to model numbers of accidents in industrial worksites. 
More recently it has bee n prop osed as a model for the number of crimes com- 
mitte d by an individual Gre9l| . the spontaneous m utation rate in filamentous 



fungi |BB03t and the number of days in a dry spell [DGVEEOSj 



It is particularly interesting when the inference process is reversed. The cross- 
section of the number of particles created in high speed hadronic collisions 
is known experimentally t o have a Greenwood- Yule distribution. This has led 



physicists jYMN74l : lMin74l | to look for a mechanism responsible for this, perhaps 



similar to the urn model for Bose-Einstein statistics. 



4.3. Sequential design 

The "two-armed" bandit , whose name se ems already to have entered the folklore 
between 1952 and 1957 [Rob52l : |BJK6^ . is a slot machine with two arms. One 



arm yields a payoff of $1 with probability p and the other arm yields a payoff of 
$1 with probability q. The catch is, you don't know which arm is which, nor do 
you know p and q. The goal is to play so as to maximize your expected return, 
or limiting average expected return. When p and q are unknown, it is not at all 
obvious what to do. At the n*^ step, assuming you have played both arms by 
then, if you play the arm with the lower historical yield your immediate return 
is sub-optimal. However, if you always play the arm with the higher historical 
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return, you could miss out forever on a much better action which mis-led you 
with an initial run of bad luck. 

The type of analysis needed to solve the two-armed bandit problem goes by 
the names of sequential analysis, adaptive control, or stochastic or op- 
timal control. Mathematically similar problems occur in statistical hypothesis 
testing and in the design of clinical trials. The formulation of what is to be 
optimized, and hence the solution to the problem, will vary with the particu- 
lar application. In the gambling problem, one wants to maximize the expected 
return, in the sense of the limiting average (or perhaps the total return in a 
finite time or infinite time with the future discounted). Determining which of 
two distributions has a greater mean seems almost identical to the two-armed 
bandit problem but the objective function is probably some combination of a 
cost per observation and a reward according to the accuracy of the inference. 
When designing a clinical trial, say to determine which of two treatments is more 
effective, there are two competing goals because one is simultaneously gather- 
ing data and treating patients. The most data is gathered in a balanced design, 
where each treatment is tried equally often. But there is an ethical dilemma each 
time an apparently less effective treatment is prescribed, and the onus is to keep 
these to a m inimum . A survey of both the statistical and ethical problems may 
be found in [Ros96| . 



The two-armed bandit problem may be played with asymptotic efficiency. In 
other words, letting X„ be the payoff at time n, there is a strategy such that 

1 " 

lim — y Xk — max{p, q} 

n — >oo Ti — ^ 

k=l 



no matter what the values of p and q. The first construction I am aware of is due 
to |Rob52t . A number of papers followed upo n that, giving mo re quantitative 
solutions in the cas es of a f inite t i me hori zon Vog62a : IVog62aj , under a finite 



memor y constraint RobSGt SP65 : Sam68 |. or in a Bayesian framework Fel62l : 



One way to formulate an algorithm for asymptotically optimal play is: 



let {e„} be a given sequence of real numbers converging to zero; with probability 
1 — e„ at time n, play whichever arm up to now has the greater average return, 
an d with probability e„ play the other arm. Such an algorithm is described 
Duf9d | and shown to be asymptotically efficient. 



m 



In designing a clinical trial, it could be argued that the common good is 
best served by gathering the most data, since the harm to any finite number of 
patients who are given the inferior treatment is counterbalanced by the greater 
efficacy of treatment for all who follow. Block designs, for example alternating 
between the treatments, were once prevalent but suffer from being predictable 
by the physician and therefore not dou ble blind. 



In 1978, Wei and Durham |WD78l | proposed the use of an urn scheme to 
dictate the sequence of plays in a medical trial. Suppose two treatments have 
dichotomous outcomes, one succeeding with probability p and the other with 
probability q, both unknown. In Wei and Durham's scheme there is an urn 
containing at any time two colors of balls, corresponding to the two treatments. 
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At each time a ball is drawn and replaced, and the corresponding treatment 
given. If the treatment succeeds, a identical balls and /3 < a balls of the opposite 
color are added; if the treatment fails, a balls of the opposite color and /3 balls 
of the same color are added. This is a GPU with random reinforcement and 
mean reinforcement matrix 

pa+{l—p)l3 {l—p)a+p(3 
{l-q)a + ql3 qa + {l-q)l3 

The unique equilibrium gives nonzero frequencies to both treatments but favors 
the more effective treatment. It is easy to execute, unpredictable, and comprises 
between balance and favoring the superior treatment. 

If one is relatively more concerned with reducing the number of inferior treat- 
ments described, then one seeks something closer to asymptotic efficiency. It is 
possible to achieve this via an urn scheme as well. Perhaps the simplest way is 
to reinforce by a constant a if the chosen treatment is effective, but never to 
reinforce the treatment not chosen. The mean reinforcement matrix for this is 

simply I ^ ) . If p = g we have a Polya urn with a random limit. If p > q 



g 

we obtain the diagonal urn of Theorem 13.31 the urn population approaches a 
pure state consisting of only the more effective treatment, with the chance of 
assigning the inferior treatment at time n being on the order of n"!^"'!/^. 

Surprisingly, the literatu re on urn schemes in sequential samplin g, as re- 
cently as the survey DirOC| contains no mention of such a scheme. In LPT04 1 



a stochastic approximation scheme is introduced. Their context is competing 
investments, and they assume a division of the portfolio into two investments 
(X„, 1 — Xn)- Let {jn} be a sequence of positive real numbers summing to infin- 
ity. Each day, a draw from the urn determines which investment to monitor: the 
first is monitored with probability Xn and the second with probability 1 — X„. 
If the monitored investment exceeds some threshold, then a fraction 7„ of the 
other investment is transferred into that investment. The respective probabili- 
ties for the investments to perform well are unknown and denoted by p and q. 
Defining T„ recursively by T„/T„+i = 1 — 7„, this is a time-dependent Polya urn 
process (see Section [3T2|) with a„ = Tn+i — Tn, modified so that the reinforce- 
ment only occurs if the chosen investment exceeds the threshold. If 7,1 = 1/n 
then ttn = 1 and one obtains the diagonal Polya urn of the preceding paragraph. 

When p ^ q, the only equilibria are at X„ = and X„ — 1. The equilibrium 
at the endpoint is attracting when p < q and repelling when p > q, and 
conversely for the equilibrium at 1. The attractor must be the limit of {^n} with 
positive probability, but can the repeller be the limit with pos itive pro bability? 



The answer depends on the sequence {7n}. It is shown in [LPT04| that for 
7n ~ n~", the repeller can be a limit with positive probability when a < 1. 
Indeed, in this case it is easy to see that with positive probability, the attractor 
is chosen only finitely often. Since we assume ^„ 7„ = 00, t his leav es interesting 
cases near 7„ w n~^. In fact Lamberton, Pages and Tarres LPT04I . Corollary 2] 



show that for 7„ — C/{n + C) and p > q, the probability of converging to the 
repeller is zero if and only if C < 1 /p. 
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4.4- Learning 

A problem of longstanding interest to psychologists is how behavior is learned. 
Consider a simple model where a subject faces a dichotomous choice: A or B. 
After choosing, the subject receives a reward. How is future beh avior in fluenced 
by the reward? Here, the subjects may be animals or hu mans: in [Her7nl | pigeons 



pecked one of two keys and were rewarded with food; in SP67| the subjects were 



rats and the reward was pleasant electrica l stimu lation; in jRE95l | the subjects 



were human and the reward monetary; in ES54| the subjects were human and 
success was its own reward. All of these experimenters wished primarily to 
describe what occurred. 

The literature on this sort of learning model is large, but results tend to 
be mixed, with one model fltting one experiment but not generalizing well. I 
will, therefore, be content here to describe two popular models and say where 
they arise. A very basic model is that after a short while, the subject learns 



which option is best and fixates on that option. According to Herrnstein Her70L 



page 243] , this do es not describe the majority of cases. A hypothesis over 100 
years old [XhoQSj . called the law of effect, is that choices will be made with 
probabilities in proportion to the total reward accumulated when making that 
choice in the past. Given a (deterministic or stochastic) reward scheme, this then 
translates into a GPU. In the economic context , the la w of effect, also called the 
matching law, is outlined by Roth and Erev REQSj . They note a resemblance 



to the evolutionary dynamics formulated by Maynard Smith MS82| , though the 
models are not the same, and apply their model and some variants to a variety 
of economic games. 

Erev and Roth provide little philosophical justification for the matching law, 
though their paper has been very influential among evolutionary game theorists. 
When there are reasons to believe that decision making is operating at a simple 
level, such models are particularly compelling. In a study of decision making by 
individuals with brain damage stemming from Huntington's disease, Busemeyer 
and Stout [BS02t compare a number of plausible models including a Bayesian 



expected utility model, a stochastic model similar to the Markovian learning 
models described in the next paragraph, and a Roth-Erev type model. They 
estimate parameters and test the fit of each model, finding that the Roth-Erev 
model consistently outperforms the others. See Section 14.61 for more general 
justifications of this type of model. 

A second type of learning model in the psychology literature is a Marko- 
vian model with constant step size, which exhibits a station ary distribution 
rather than convergence to a random limit. Norman Nor74l | reviews several 



such models, the simplest of which is as follows. A subject repeatedly predicts 
A or B (in this human predicts whether or not a lamp will flash). The 

subject's internal state at time n is represented by the probability the subject 
will choose A, and is denoted Xn- The evolution rules contain for parameters, 
9i, . . . ,04 G (0, 1). The four possible occurrences are choose A correctly, choose 
A incorrectly, choose B incorrectly, or choose B correctly, and the new value of 
X„+i is respectively X„ + 6Ii(1-A:„), (l-6i2)X„, Xn + Oail-Xn) or (l-6i4)A:„. 
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Such models were introduced by |ES54I : IBM55I |. The corr espondi ng; Markov chain 
on [0, 1] is amenable to analysis. One interesting result Nor 74 , Theorem 3.3] is 
when 9i — 9i = 9 and 92 = O3 = 0. Sending 9 to zero while n9 ^ t gives 
convergence of Xnt to the time-t distribution of a limiting diffusion. 



4-5. Evolutionary game theory 

Evolutionary game theory is the marriage of the economic concepts of game 
theory and Nash equilibria with the paradigm of Darwinian evolution originat- 
ing in biology. A useful reference is |HS9 8^ (replacing the earlier work [HS88I |). 
which has separate introductions for economists and biologists. This subject 
has exploded in the last several decades, with entire departments and institutes 
devoted to its study. Naturally, only a very small piece can be discussed here. 
I will present several applications that reflect the use of urn and reinforcement 
models, capturing the flavor of this area by giving a vignette rather than a care- 
ful history of ideas and methods in evolutionary game theory (and even then, 
it will take a few pages to arrive at any urn models). 



Economics meets biology 

Applications of evolutionary game theory arise both in economics and biology. 
This is because each discipline profits considerably from the paradigms of the 
other, as will now be discussed. 

A dominant paradigm in genetics is the stochastic evolution of a genome 
in a fitness landscape. The fitness landscape is a function from genotypes 
to the real numbers, measuring the adaptive fitness of the corresponding phe- 
notype in the existing environment. A variety of models exist for the change 
in populations of genotypes based on natural selection with respect to the fit- 
ness landscape. Often, randomness is introduced by mechanisms of mutation 
as well as by stochastic modeling of interactions with the environment. Much 
of the import of any particular model is in the details of the fitness landscape. 
Any realistic fitness landscape is hopelessly intractable and different choices of 
simplifications lead to models illuminating different aspects of evolution. 

Game theory enters the biological scene as one type of model for fitness, 
designed to capture some aspect of the behavior of interacting organisms. Game 
theoretic models focus on one or two behavioral attributes, usually modeled as 
expressions of single genes. Different genotypes correspond to different strategies 
in a single game. Fitness is modeled by the payoff of the given strategy against 
a mix of other strategies determined by the entire population. Selection acts 
through increased reproduction as a function of fitness. 

In economics, the theory of games and equilibria has been a longstanding 
dominant paradigm. Interactions between two or more agents are formalized by 
payoff matrices. Pure and mixed strategies are allowed, but it is generally held 
that the only strategies that should end up played by rational, informed agents 
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should be Nash equilibri80, that is, strategies that cannot be improved upon 
given the stochastic mix of strategies in use by the other agents. Two-player 
games of perfect information are relatively straightforward under assumptions 
of rationality and perfect information. There is, however, often a distressing 
lack of correspondence between actual behavior and what is predicted by Nash 
equilibrium theory. 

Equilibrium selection 

Equilibrium theory can only predict that certain strategies will not be played, 
leaving open the question of selection among different equilibria. Thus, among 
the questions that motivated the introduction of evolutionary mechanisms are: 

• equilibrium selection Which of the equilibria will be played? 

• equilibrium formation By what route does a population of players come 
to an equilibrium? 

• equilibrium or not Will an equilibrium be played at all? 

Darwinism enters the economic scene as a means of incorporating bounded in- 
formation and rationality, explaining equilibrium selection, and modeling games 
repeated over time and among collections of agents. Assumptions of perfect in- 
formation and rationality are drastically weakened. Instead, one assumes that 
individual agents arrive with specific strategies, which they alter only due to 
data about how well these work (fitness) or to unlikely chance events (muta- 
tion). These models make sense in several types of situation. One is when agents 
are assumed to have low information, for instance in modeling adoption of new 
technology by consumers, companies, and industries (see the discussion in Sec- 
tion |4T] of VHS versus Betamax, or Apple versus Mac). Another is when agents 
are bound by laws, rules or protocols. These, by their nature, must be simple 
and genera^. 

One early application of evolutionary game theory was to explain how play- 
ers might avoid a Pareto-dominated equilibrium. The ultimate form of this is 
the Prisoner's dilemma paradox, in which smart people (e.g., game theorists) 
must choose the only Nash equilibrium, but this is not Pareto-optimal and in 
fact is dominated by a non-equilibrium play chosen by uneducated people (e.g., 
mobsters). There are by now many solutions to this dilemma, most commonly 
involving repeated play. Along the lines of evolutionary game theory, large-scale 
interactive experiments have been ru in which contestants are solicited to sub- 
mit computer programs that embody various strategies in repeated Prisoner's 



''Many refinements of this notion have been formulated, including subgame-perfect equi- 
libria, coordinated equilibria, etc. 

^Morals and social norms may be viewed as simple and general principles that may be 
applied to complex situations. An evolutionary game theoretic approach to explainin g these 
may t herefore seem inevitable, and indeed this is the thrust of recent works such as [Skv04l : 
lAleOSll . 

®The first was apparently run by Robert Axelrod, a political scientist at the University of 
Michigan. 
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Dilemma, and then these are run against each other (in segments of 50 games 
against each individual opponent) with actual stochastic replicator dynamics to 
determine which strategies thrive in evolving population^. 

In the context of more general two-player games, Harsanyi and Selten intro- 
duced the concept of the risk-dominant equilibrium. This is a notion satisfying 
certain axioms, among which are naturality not only with respect to game- 
theoretic equivalences but also the best-reply structure. Consider symmetric 
2x2 games of the form 

f Ky) (0,0) \ 

I (0,0) (6,z) ; ■ 

When a > b and z > y this is a prototypical Nash Bargaining Game. The strat- 
egy pair (1,1) is risk-dominant if ay > bz. For these games, Pareto-optimality 
implies risk-dominance, but for other 2x2 games with multiple equilibria, the 
risk-dominant equilibrium may not be Pareto-optimal. 

Another development in the theory of equilibrium selection, dating back to 
around 1973, was Selten's trembling hand. This is the notion of stochastically 
perturbing a player's chosen strategy with a small probability e. The idea is that 
even in an obviously mutually beneficial Nash equilibrium, there is some chance 
that the opponent will switch to another strategy by mistake (a trembling of 
the hand), if not through malice or stupiditjU. A number of notions of equilibria 
stable under such perturbations arose, depending on the exact model for the e- 
perturbation, and the way in which e — > 0. An early definition due to J. Maynard 
Smith was formulated without probability. An evolutionarily stable strategy 
is a strategy such that if it is adopted by a fraction 1 — e of the population, then 
for sufficiently small e, any other strategy fares worse. 

Replicator dynamics 

One of the earliest and most basic evolutionary game theoretic models is the 
replicator. There are two versions: the (deterministic) replicator dynamical sys- 
tem and the stochastic replicator. The deterministic replicator assumes a pop- 
ulation in which pairs of players with strategy types 1, . . . ,to are repeatedly 
selected at random from a large population, matched against each other in a 
fixed (generally non-zero-sum) two-player game, and then given a selective ad- 
vantage in accordance with the outcome of the game. Formally, the model is 
defined as follows. Fix a two-player (non-zero-sum) game with m strategies for 
each player such that the payoff to i when playing i against j does not depend 
on whether the player is Player 1 or Player 2; the matrix of these outcomes is 
denoted M. Let X(i) denote the normalized population vector, that is, Xi{t) is 
the proportion of the population at time t that is of type i. For any normalized 
population vector y, the expected outcome for strategy i against a random pick 

'^One simple strategy that did well in many of these experiments was "Tit for tat": do this 
time what your opponent did last time. 

*It is best not to think too much about this when driving past oncoming traffic on a 
two-lane highway. 
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from the population is E{i,y) := Yl'JLi Miji/j. Let E'{i,y) E{i,y) - i?o(y) 
where £^o(y) '■— X^JLi-^O^y) ^he average fitness for the population y; we 
interpret E'(i,y) as the selective advantage of type i in population y and let 
E'(y) denote the vector with components E'{i,y). The replicator model is the 
differential equation 

jXit) = E'{X{t)). (4.3) 

T he replicator equ ation was introduced by [TJ78 | (thou gh the ideas are pr esent 
in [MSP73l : [MS74t ) and dubbed "replicator equation" bv [SS83| . The books |MS82 : 
HS88| studv it extensively. 



The notion of evolutionarily stable strategies may be generalized to mixed 
strategies by means of replicator dynamics. Nash equilibria correspond to rest 
points for the replicator dynamics. An evolutionarily stable sta te is a pop- 
ulation vector that is an attractor for the replicator dynamics (see HSQSl The- 
orem 7.3.2]). 

The presence of the continuous parameter in replicator dynamics indicates 
that they are a large-population limit. There are a number of discrete systems 
achieving this limit, but one of the most natural is the stochastic replicator. 
Fix a positive integer d and a, d x d real matrix M. We view M as the payoff 
matrix (for the first player) in a two-player game with d possible strategies, and 
assume it is normalized to have nonnegative entries. At each integer time t > 
there is a population of some size N{t), consisting of individuals whose only 
attributes are their type, the allowed types being {1, . . . , d}. These individuals 
are represented by an urn with balls of colors 1, . . . ,d numbering N{t) altogether. 
The population at time t -I- 1 is determined as follows. Draw i and j at random 
from the population at time t (with replacement) and return them to the urn 
along with Mij extra balls of type i. The interpretation is that A/^ is the fitness 
of strategy i against strategy j and that the interaction between the two agents 
causes the representation of type i in the population will change on average by 
an amount proportional to its fitness against the other strategy it encounters. 
Repeating this will allow the average growth of type i to be proportional to 
its average success against all strategies weighted by their representation in the 
population. One might expect an increase as well of Rji in type j, since the 
interaction has, after all, effects on two agents; in the long run such a term 
would simply double the rate of change, since an individual will on average be 
chosen to be Player 1 half the time. 

Much of the preceding paragraph is drawn from S. Schreiber's article SchOl|, 
in which further randomization is allowed (M is the mean matrix for a random 
increment); as we have seen before, this randomness is not particularly conse- 
quential; enough randomness enters through the choice of two individual players. 
Schreiber also allows Mij G [—1,0], which gives his results more general scope 
than some of their predecessors. 

The stochastic replicator is evidently a generalized Polya urn and its mean 
ODE is 

= Z{tf MZ{t). 



dt 
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One may also consider the normalized population vector X(t) := Z{t) /\Z{t)\, 
where |Z(t)| is the sum of the components of Z(i). This evolves, as promised, 
by a (possibly time-changed) replicator equation 

^ = diag(X(i)) A/X(0 - X{t)[K{tfMX(t)] . (4.4) 

In other words, the growth rate of Xi is MijXj — J2rj ^rMijXj). 

The early study of replicator dynamics concentrated on determining trajec- 
tories of the dynamical systems, formulating a notion of stability (such as the 
evolutionarily stable strategy of [MSP73j ), and applying these to theoreti- 
cally interesting biological systems (see especially [mS82]). 

The stochastic replicator process fits into the framework of Benai'm et al. 
described in Section 12.51 (except for the p ossibility of extinction when Ma is 



allowed to be negative). Schreiber [SchOll . Theorem 2.2] proves a version of 



Theorem 12.131 for replicator processes, holding on the event of nonextinction. 
This allows him to derive a version of Corollary 1 2 . 1 51 for replicator process. 

Theorem 4.2 (^ [SchOll . Corollary 3.2]). Let X :— {X„} be the normalized popu- 
lation vector for a replicator process with positive expected growth. Then almost 
surely on the event of nonextinction. the limit set L{X) satisfies the three equiv- 
alent properties in Proposition \2.1(A □ 

It follows from the attractor convergence theorem 12.161 that any attractor 
in t he dyna mical system attracts the replicator process with positive probabil- 
ity |BST04L Theorem 7]. 



Comple ting th e circle ideas, Schreiber has applied his results to a biological 
model. In SL96j . data is presented showing that three possible color patterns 



and associated behaviors among the side-blotched lizard uta stanshuriana have 
a non-transitive dominance order in terms of success in competing for female^. 
Furthermore, the evolution of population vectors over a six-year period showed 
a cycle predicted by the dynamical system models of Maynard Smith, which are 
cited in the paper. Schreiber then applies replicator process urn dynamics. These 



are the same as in the classic Rock-Paper-Scissors example analyzed in [HS98] 
and they predict initial cycling followed by convergence to an even mix of all 
three types in the population. 



Fictitious play 

A quest somewhat related to the problem of explaining equilibrium selection is 
the problem of finding a mechanism by which a population might evolve toward 
any equilibrium at all in a game with many strategies. In other words, the 
emphasis moves from explaining behavior in as Darwinistic a manner as possible 
to using the idea of natural selection to formulate a coordination algorithm by 
means of which relatively uninformed agents might adaptively find good (i.e., 

®This is evidently the first reported manifestation of this theoretical possibility and I highly 
recommended reading the brief article to see the details. 
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equilibrium) strategies. Such algorithms are quite important in computer science 
(internet protocols for use of shared channels, coordination protocols for parallel 
processing, and so for th). 



In 1951, G. Brown [BroSlj proposed a mechanism known as fictitious play. 
A payoff matrix M is given for a two-player, zero-sum game. Two players play 
the game repeatedly, with each player choosing at time n -I- 1 an action that 
is optimal if under the assumption that the other player will play according to 
the past empirical distribution. That is, Player 1 plays i on turn n -I- 1 where 
i is a value of x maximizing the average payoff Sfc=i ^'^x,yk ^^'^ yij^^^^Jin 



are the previous plays of Player 2; Player 2 plays analogously. Robinson |Rob51 | 



showed that for each player, the empirical distribution of their play converges 
to an optimal mixed strategjf^. 

Fictitious play makes sense for non-zero sum games as well, and for games 
with more than two players, provided it is specified whether the Bayesian 
assumption is that each other player independently plays from his empirical 
distribution or whether the joint play of the other players is from the joint 
empirical distribu tion. Robinson's result was extended to non- zero-su m 2x2 



games by |Miv6l( , but then shown to fail in gene ral by S hapley |Sha64l | (a two- 
player, three-strategy counterexample; see also |jor93l | for a counterexample 
with dichotomous strategies but three players). There are, however, subclasses 
of non-zero-sum games for which fictitious play has been shown to converge 
to Nash equilibria. These include potential games M S 9^ (every player re- 
ceives the same payoff), super-modular games (MR90| (the payoff matrix is 
super-modular) and games with interior evolutionarily stable strategies. 

Although originally proposed as a computational mechanism, fictitious play 
became popular behavioral modelers. However, when interpreted as a psycho- 
logical micro-level mechanism, there are troubling aspects to fictitious play. 
For a two-player zero-sum game with a unique Nash equilibrium, while the 
marginals will converge to a saddle point, the plays of the two players may be 
entirely coordinated, so that actual payoffs may not have the correct long-run 
average. When there are more than two players, modeling the opponents' fu- 
ture plays as independent picks from empirical marginals seems overly nai've 
because the empirical joint distribution is known. (The coordination problems 
that can arise with two players can be thought of in the same way: a failure to 
model dep endenc e between the opponent's plays one's own plays.) Fudenberg 
and Kreps [FK93t address these concerns via a greatly generalized framework of 



optimum response. There chief concern is to give a notion of convergence to Nash 
equilibrium that precludes the kind of coordination pr oblems mentioned above. 



In doing so, they take up the notion, due to Harsanyi [Ilar73l | . of stochastically 
perturbed best response, in which each player has independent noise added to 
the utilities during the computation of the optimum response. They then ex- 
tend Miyasawa's result on convergence of fictitious play for 2x2 non-zero-sum 
games to the setting of stochastic fictitious play, under the assumption of a 



^•^The same is true with alternating updates, and in fact convergence appears to be faster 
in that case. 
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unique Nash equilibrium [FK93I . Proposition 8.1]. 

Stochastically perturbed fictitious play fits directly into the stochastic ap- 
proximation framework. While the stochastic element caused technical difficul- 
ties for Fudcnbcrg and Kr eps, for whom the available technology was limited 



to pre- 1990 works such as [KCTSt lLiu77l |. this same element fits nicely into the 



framework of Benai'm et al. to eliminate unstable trajectories. The gro undwor k 



for an analysis in the stochastic approximation framework was laid in (BH99a | 



They obtain the usual basic conclusions: the system converges to chain recur- 
rent sets for the associated ODE and attractors attract with positive probability. 
They give examples of failure to converge, including the stochastic analogue of 
Jordan's 2x2x2 counterexample. They then begin to catalogue cases where 
stochastic fictitious play does c onverge . Under suitable nondegeneracy assump- 
tions on the noise, they extend FK93I . Proposition 8.1] to allow at most count- 



ably many Nash equilibria. Perhaps more interesting is their introduction of a 
class of two-player n x 2 games they call generalized coordination games 
for which they are able to obtain convergence of stochastic fictitious play. This 
condition is somewhat restrictive, but in a subsequent work [BH99b|, they for- 
mulate a simpler and more general condition. Let F denote the vector field of 
the stochastic approximation process associated with stochastically perturbed 
fictitious play for a given m-player (non-zero-sum) game. Say that F is co- 
operative if dFi/dxj > for every i ^ j- For example, it turns out that 
the vector field for any generalized coordination game is cooperative. Under a 
number of technical assumptions, they prove the following result for any coop- 
erative stochastic approximation. Note though, that this is proved for stochastic 
approximations with constant step size e, as e ^ 0; this is in keeping with the 
prevailing economic formulations of perturbed equilibria, but in contrast to the 
usual stochastic approximation framework. 

Theorem 4.3 (' [BH99bl . Theorem 1.5]). If F is cooperative then as e — > 0, the 
empirical measure of the stochastic approximation process converges in proba- 
bility to the set of equilibria of the vector field F. If in addition either F is real 
analytic or has only finitely many stable equilibria, then the empirical distribu- 
tion converges to an asymptotically stable equilibrium. □ 

Remark. This result r equires constant step size (|2.6p but is conjectured to hold 
under (|2.7p - (|2.8p : see BenOOl Conjecture 2.3]. The difficulty is that the conver- 



gence theorems for general step sizes require smoother unstable manifolds than 
can be proved using the cooperation hypothesis. 

Benai'm and Hirsch then show that this result applies to any m-player gener- 
alized coordination game with stochastic fictitious play with optimal response 
determined as in the framework of [F K93i] . provided that the response map is 
smooth (which requires some noise). Generalized coordination games by defi- 
nition have only two strategies per player, so the extension of these results to 
multi-strategy games was left open. At the time or writing, the final install- 
ment in the story of stochastic fictitious play is the extension by Hofbauer and 
Sandholm of the non-stochastic convergence results (for potential games, su- 
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permodular games and g ames w ith an internal evolutionarily stable strategy) 



to the stocha s tic setti ng HS02| . Forthcoming work of Benai'm, Hofbauer and 



borm iB HSOSl : lBHSOd | replaces the differential equation by a set valued differ- 
ential inclusion in order to handle fictitious play with imperfect information 
or with discontinuous F. 



4.6. Agent-based modeling 



In agent-based models, according to Bon02| . "A system is modeled as a col- 



lection of autonomous decision-making entities called agents, [with each] agent 
individually assessing its situation and making decisions on the basis of a set 
of rules." A typical example is a graph theoretic model, where the agents are 
vertices of a graph and at each time step, each agent chooses an action based on 
various characteristics of its neighbors in the graph; these actions, together with 
external sources of randomness, determine outcomes which may alter the char- 
acteristics of the agents. Stochastic replicator dynamics fall within this rubric, 
as do a number of the other processes already discussed. The boundaries are 
blurry, but this section is chiefly devoted to agent-based models from the social 
sciences, in which some sort of graph theoretic structure is imposed. 

Analytic intractability is the rule rather than the exception for such models. 
The recent boom in agent-based modeling is probably due to the emergence 
of fast computers and of software platforms specialized to perform agent-based 
simulation. One scientific utility for such models is to give simple explanations 
for complex phenomena. Another motivation comes from psychology. Even in 
situations where people are capable of some kind of rational game-theoretic 
computation, evidence shows that actual decision mechanisms are often much 
more primitive. Brain architecture dictates that the different components of a 
decision are processed by different centers, with the responses then chemically or 
electrically superimposed (see for example AHSO^). Three realistic components 



of decision making, captured better by agen t-based models than by rational 
choice models are noted by Flache and Macy FM02I . page 633]: 



• Players develop preferences for choices associated with better outcomes 
even though the association may be coincident, causally spurious, or su- 
perstitious. 

• Decisions are driven by the two simultaneous and distinct mechanisms 
of reward and punishment, which are known to operate ubiquitously in 
humans. 

• Satlsficing, or persisting in a strategy that yields a positive but not op- 
timal outcome, is common and indicates a mechanism of reinforcement 
rather than optimization. 

Agent-based models now abo und in a variety of s ocial s cience disciplines, 
including p sychology, sociology |BL03j l . pubHc health EL04 [ . political science 



OMH+041 . The discussion here will concentrate on a few game-theoretic appli- 



cations in which rigorous results have been obtained. 
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A number of recent analyses have centered on a two-player coordination game 
similar to Rousseau's stag hunt. Each player can choose to hunt rabbits or 
stags. The payoff is bigger for a stag but the stag hunt is successful only if both 
players hunt stag, whereas rabbit hunting is always successful. More generally, 
consider a payoff matrix as follows 



(a, a) (c,d) 
(d,c) ib,b) 



(4.5) 



When a > d and b > c, the outcomes (a, a) and (6, b) are both Nash equilibria. 
Assume these inequalities, and without loss of generality, assume a > b. Then 
if a — d > 6 — c, the outcome (6, b) is the risk-dominant equilibrium, whereas 
(a, a) is always the unique Pareto-optimal equilibrium. 

In 1993, Kandori, Mailath and Rob KMR93] analyzed a very general class of 
evolutionary dynamics for populations of N individuals associated with the two 
strategy types. The class included the following extreme version of stochastic 
replicator dynamics: each player independently with probability 1 — 2e changes 
type to whatever strategy type was most successful against the present popu- 
lation mix, and with probability 2e resets the type according to the result of 
independent fair coins. In the case of a game described bv l4.5l they showed that 
the resulting Markov chain always converged to the risk-dominant equilibrium 
in the sense that the chain had a stationary measure fiN,(: satisfying: 

Theorem 4.4 f [KMR93l Theorem 3]). As e ^ with N fixed and sufficiently 



large, ^N,e converges to the entire population playing the risk- dominant equilib- 
rium. □ 

Proof: Assume without loss of generality that a — d > b — c, that is, that 
strategy 2 is risk-dominant. There is an embedded two-state Markov chain, 
where state 1 contains all populations where the proportion of type 1 players is 
at least aN, and a(e) is the threshold for strategy 1 to be superior to strategy 2 
against such a population. Due to a — d > 6 — c, we know a < N/2. Going from 
state 2 to state 1 occurs exactly when there are at least aN "mutations" (types 
chosen by coin-flip) and going from state 1 to state 2 occurs when there are at 
least aN mutations. The ratio of the stationary measures of state 1 to state 2 
goes to the ratio of these two probabilities, which goes to infinity. □ 
Unfortunately, the waiting time to get from either state to the other is ex- 
ponential in iVlog(l/e), meaning that for many realistic parameter values, the 
population, if started at the sub-optimal equilibrium, does not have time to 
learn the better equilibrium. This many simultaneous mutations are as rare as 
all the oxygen mol ecules suddenly moving to the other side of the room (well 
not quite). Ellison E1193| proposes a variant. Let the agents be labeled by the 



integers modulo N, and for fixed k < N/2, let i and j be considered ne ighbors if 
their graph distance is at most k. Ellison's dynamics are the same as in KMR93j 



except that each agent with probability 1 — 2e chooses the best play against the 
reference population consisting of that individual together with its 2k neighbors. 
The following result shows that when global interactions are replaced by local 
interactions, the population learns the optimal equilibrium much more rapidly. 
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Theorem 4.5 ( E1193I . Theorem 3]). For fixed k and sufficiently small e, as 
N ^ oo the expected time from any state to a state with most players of type 1 
remains constant. 

Proof: Let j < A: be such that j out of 2fc + 1 neighbors of type 1 is sufficient 
to make strategy 1 optimal. Once there are j consecutive players of type 1, the 
size of the interval of players of type 1 (allowing an e fraction of errors) will 
tend to increase by roughly 2{k — j — eN) at each turn. The probability of the 
interval r + 1, . . . , r + j all turning to type 1 in one step is small but nonzero, 
so for sufficiently large N, such an interval arises immediately. □ 
The issue of how people might come to choose the s uperior (a, a) in this case 
has been of longstanding concern to game theorists. In |SPOCll |. a new evolution- 
ary dynamic is introduced. A two-player game is fixed, along with a population 
of players labeled 1, ... ,7V. Each player is initially assigned a strategy type. 
Positive weights w{i,j,l) are assigned as well, usually all equal to 1. The novel 
element to the model is the simultaneous evolution of network structure with 
strategy. Specifically, the network at time t is given by the collection of weights 
'w{i,j,t) representing propensities for player i to interact with player j at time 
t. At each time step, each player i chooses a partner j independently at random 
with probabilities proportional to w{i,i,t), then plays the game with the part- 
ner. After this, w{i,j,t + 1) is set equal to w{i,j,t) + u and w{j,i,t + 1) is set 
equal to w{j,i,t) H- u' , where u and u' are the respective utilities obtained by 
players i and j. (Note that each player plays at least once in each round, but 
more than once if the player is chosen as partner by one of more of the other 
players.) 

In their first model, Skyrms and Pemantle take the strategy type to be fixed 
and examine the results of evolving network structure. 

Theorem 4.6 ( jsponl . Theorem 6]). Consider a network of2n players in Rousseau's 
Stag Hunting game given by the payoff matrix 

(1,1) (0,0.75) 
(0.75,0) (0.75,0.75) 

with 2k > stag hunters and 2{n — k) > rabbit hunters. Under the above 
network evolution rules, with no evolution or mutation of strategies, as t oo, 
the probability approaches 1 that all stag hunters choose stag hunters and all 
rabbit hunters choose rabbit hunters. 

Proof: If i is a stag hunter and j is a rabbit hunter then w{i^j,t) remains 1 
for all time; hence stag hunters do not choose rabbit hunters in the limit. The 
situation is more complicated for w{j, i,t), since rabbit hunters get reinforced no 
matter whom they choose or are chosen by. However, if A denotes the set of stag 
hunters and Z{j,t) denotes the probability '^ifzA'^ij^h^) I ^iw{j,i,t) that j 
will choose a stag hunter at time t, then it is not hard to find A, /i > such that 
exp(AZ(j, t) -|- /zlogi) is a supermartingale, which implies that Z{i,t) —t (in 
fact, exponentially fast in \ogt). □ 
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Further results via simulation show that when each agent after each round 
decides with a fixed probability e > to switch to the strategy that is optimal 
against the present population, then all agents converge to a single type. How- 
ever, it is random which type. When the evolution of strategy was slow (e.g., 
e = 1/100), the system usually found at the optimal equilibrium (everyone hunts 
stag) but when the evolution of strategy was more rapid (e.g., e = 1/10), the 
majority (78%) of the simulations resulted in the maximin equilibrium where 
everyone hunts rabbits. Evidently, more rapid evolution of strategy causes the 
system to mirror the stochastic replicator models in which the risk-dominant 
equilibrium is always chosen. 

4-7. Miscellany 

Splines and interpolating curves 

Computer-aided drawing programs often provide interpolated curves. A finite 
sequence xq, . . . ,Xn of control points in R'^ are specified, and a curve {f{t) : < 
t < 1} is generated which in some sense approximates the polygonal path g{t) 
defined to equal Xk + {nt — k)(xu+i — Xk) for k/n < t < {k + l)/n. In many 
cases, the formula for producing / is 



Depending on the choice of {Bn^k{t)}, one obtains some of the familiar blending 
curves: Bezie r curve s, B-splines, and so forth. 



Goldman [Gol85l | proposes a new family of blending functions. Consider a 
two-color Polya urn with constant reinforcement c > 0, initially containing t red 
balls and 1 — t black balls. Let Bn,k{t) be the probability of obtaining exactly 
k red balls in the first n trials. The functions {-B„,fc} are shown to have almost 
all of the requisite properties for families of blending functions. In particular, 

(i) {Bn^k{t) : fc = 0, . . . , n} are nonnegative and sum to 1, implying that the 
interpolated curve is in the convex hull of the polygonal curve; 

(ii) Bn^kit) = Bn.n-ki^ — t) implying symmetry under reversal; 

(iii) Bn.kiO) — <5fc,o and i?n,fc(l) = (5fc,„, implying that the curve and polygon 
have the same endpoints (useful for piecing together curves); 

(iv) J2k=o kBn^k{i) = nt, implying that the curve is a line when Xk+i — Xk is 
independent of fc; 

(v) The curve is less wiggly than the polygonal path: for any vector v, the 
number of sign changes of f{t) • v is at most the number of sign changes 



(vi) Given Pqj ■ ■ ■ , Pn there are Qo, . . . , Qn+i that reproduce the same curve 
f{t) with the same parametrization; 

(vii) Any segment {/(i) : a < t < b} of the curve with control points Pq, . . . ,Pn 
is reproducible as the entire curve corresponding to control points Qq, . . . ,Q, 
where the parametrization may differ but n remains the same. 



n 



/(t) =^B„,fc(t)xfc. 




of ff(t)-v 
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There is of course an explicit formula for the polynomials Bn^k{t). This gen- 
eralizes the Bernstein polynomials, which are obtained when the reinforcement 
parameter c is zero. However, the urn model pulls its weight in the sense that 
verification of many of the features is simplified by the urn model interpretation. 
For example, the first fact translates simply to the fact that for fixed n and t 
the quantities Bn^k{i) are the probabilities of A; + 1 possible values of a random 
variable. 

In a subsequent pape r iGolSSal . Goldman goes on to represent the so-called 
Beta-spline functions of BarSlj via a somewhat more complicated time- varying 
Friedman urn model. Classical B-splines have a similar representation, which 
has conseque nces for the closeness of approximations by B-splines and Bernstein 
polynomials [Gol88bl | . 



Image reconstruction 



An interesting application of a network of Polya urns is described in |BBA99t . 



The object is to reconstruct an image, represented in a grid of pixels, each of 
which contains a single color from a finite color set {1, . . . , fc}. Some coherence 
of the image is presumed, indicating that pixels dissimilar to their neighbors are 
probably errors and should be changed to agree with their neighbors. Among 
the existing methods to do this are maximum likelihood estimators, Markov 
random field models with Gibbs-sampler updating, and smoothing via wavelets. 
Computation of the MLE may be difficult, the Gibbs sampler may converge too 
slowly, and wavelet computation may be time-consuming as well. 

Banarjee et al. propose letting the image evolve stochastically via a network 
of urns. This is fast, parallelizable, and should capture the qualitative features of 
smoothing. The procedure is as follows. There is an urn for each pixel. Initially, 
urn X contains x{j) balls of color j, where 



, d{x,y) 

and S{y,j) is one if pixel y is colored j and zero otherwise. In other words, 
the initial contents are determined by the empirical distribution of colors near 
X, weighted by inverse distance. Define a neighborhood structure: for each x 
there is a set of pixels N{x); this may for example be nearest neighbors or all 
pixels up to a certain distance from x. The update rule for urn x is to sample 
from the combined urn of all elements of N(x) and add a constant number A 
of balls of the sampled color to urn x. This may be done simultaneously for all 
x, sequentially, or by choosing x uniformly at random. After a long time, the 
process halts and the output configuration is chosen by taking the plurality color 
at each pixel. The mathematical analysis is incomplete, but experimental data 
shows that this procedure outperforms a popular relaxation labeling algorithm 
(the urn scheme is faster and provides better noise reduction). 
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5. Reinforced random walk 



In 1987 |Dia88j (see also jCD87l |'l. Diaconis introduced the following process, 



known now as edge-reinforced random walk or ERRW. A walker traverses 
the edges of a finite graph. Initially any edge incident to the present location 
is equally likely, but as the process continues, the likelihood for the walker to 
choose an edge increases with each traversal of that edge, remaining proportional 
to the weight of the edge, which is one more than the number of times the edge 
has been traversed in either direction. 

Formally, let G :— {V, E) be any finite graph and let w e y be the starting 
vertex. Define Xi^ = v and W{e, 0) = 1 for all e £ E. Inductively define JF„ := 
aiXo, . . . , X„), Wily, z}, n) = Wi{y, z}, n^l) + l({X„_i, X„} = {y, z}), and 
let 

W{{Xn,w},n) 



j:^W{{X^,z},n) 



if w is a neighbor of X„ and zero otherwise. The main result of jCD87l | is that 
ERRW is a mixture of Markov chains, and that the edge occupation vector 
converges to a random limit whose density may be explicitly identified. 



Theorem 5.1 ( [PiaSSl (4.2)]). Let be an ERRW on the finite graph 



G — {V,E) beginning from the vertex vq- Then {Xn} is a mixture of Markov 
chains, meaning that there is a measure fi on transition probabilities {p{v, w) : 
{v, w} G E} such that 

P{Xo = Vq, . . . ,Xn = Vn) = J p{vo,Vi) ■ ■ • p(u„_ i , U„) . 

Furthermore, the weights W := {W{e,n) : e € E} approach a random limit 
continuous with respect to Lebesgue measure on the simplex {W : w{e) > 
0, ^gi/7(e) = 1} of sequences of nonnegative numbers indexed by E and sum- 
ming to 1. The density of the limit is given by 

C n w{eY'^ n ^i'(^^)~^'+'^"»/'<'l^r/' (5.1) 

eS-E v£V 

where w(v) denotes the sum of w{e) over edges e adjacent to v, d{v) is the degree 
of V and A is the matrix indexed by cycles G forming a basis for the homology 
group Hi{G) with A{C,G) := J^eec ^/w{e) and A{G,D) = Eegcnc ±Vw'(e) 
with a positive sign if e has the same orientation in C and D and a negative 
sign otherwise. □ 

This re sult is proved by invoking a notion of partial exchangeability dF38j . 



shown by DF80l | to imply that a process is a mixture of Markov chainj^. The 



formula (jS.ip is then proved by a direct computation. The computation was 
never written down and remained unavailable until a more general proof was 



^^When the process may not visit sites infinitely often, some care must be taken in deducing 
the representation as a mixture of Mar kov cha ins from partial exchangeability; see for example 
the recent paper of Merkl and RoUes |MR07| . 
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published by Keane and RoUes jKR99l | . The definition extends easily to ERRW 
on the infinite lattice Z"^ and Diaconis posed the question of recurrence: 

Question 5.1. Does ERRW on return to the origin with probability 1? 

This question, still open, has provoked a substantial amount of study. Early 
results on ERRW and some of its generalizations are discussed in the next sub- 
section; the following subsections concern two other variants: vertex-reinforced 
random walk and continuous time reinforced random walk on a graph. For fur- 
ther results on all sorts of ERRW models, the reader is referred to the short but 
friendly survey ^MR06.] . 



5.1. Edge-reinforced random walk on a tree 



A preliminary observation is that ERRW on a directed graph may be represented 
by a network of Polya urn processes. That is, suppose that P{Xn+i = w \ Tn) is 
proportional to one plus the number of directed transits from X„ to w. Then for 
each vertex ti, the sequence of vertices visited after each visit to v is distributed 
exactly as a Polya urn process whose initial composition is one ball of color 
w for each neighbor w of v] as v varies, these urns are independent. Formally, 
consider a collection of independent Polya urns labeled by vertices v & V, the 
contents of each of which are initially a single ball of color w for each neighbor w 
of V] let {Xn^v : n = 1, 2, . . .} denote the sequence of draws from urn w; then we 
may couple an ERRW {X„} to the independent urns so that Xn+i = w 
Xs^v = where s is the number of times v has been visited at time n. 

For the usual undirected ERRW, no such simple representation is possible 
because the probabilities of successive transitions out of v are affected by which 
edges the path has taken coming into v. However, if G is a tree, then the first 
visit to w ^ Wo must be along the unique edge incident to v leading toward vq 
and the (n -I- l)*** visit to v must be a reverse traversal of the edge by which the 
walk left V fo r the n*^ time. This observation, which is the basis for Lemma [2T4l 
was used by |Pem88a | to represent ERRW on an infinite tree by an infinite col- 
lection of independent urns. In this analysis, the reinforcement was generalized 
from 1 to an arbitrary constant c > 0. The urn process corresponding to v ^ vq 
has initial composition (1-l-c, where the first component corresponds to 

the color of the parent of v, and reinforcement 2c each time. Recalling from (|4.2p 
that such an urn is exchangeable with limit distribution that is Dirichlet with 
parameters (1 -f c)/(2c), l/(2c), . . . , l/(2c), one has a representation of ERRW 
on a tree by a mixture of Markov chains whose transition probabilities out of 
each vertex are given by picks from the specified independent Dirichlet distri- 
butions. This le ads to the following pha se transition result (see also extensions 
by Collevecchio |Col04l: IColOeal: IColOebt ). 

Theorem 5.2 ( |Pem88al Theorem 1]). There is a constant cq ~ 4.29 such that 
ERRW on an infinite binary tree is almost surely transient if c < Cq and almost 
surely recurrent if c > Cq. □ 
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5.2. Other edge- reinforcement schemes 

The reinforcement scheme may be generahzed in several ways. Suppose the 
transition probabihties out of X„ at step n are proportional not to the weights 
w{{Xn, w}, n) incident to Xn at time n but instead to F{'w{{Xn, w}, n)) where 
F : Z+ R+ is any nondecreasing function. Letting a„ :— F{n) — F{n — 1), 
one might alternatively imagine that the reinforcement is a„ on the n*'^ time 
an ed ge is cro ssed (see the paragraph in Section [3?2l on ordinal dependence). 
Davis Dav99| calls this a reinforced random walk of sequence type. A special 



case of this is when ai — 5 and a„ = for n > 2. This is called once-reinforced 
random walk for the obvious reason that the reinforcement occurs only once, 
and its invention is usually attributed to M. Keane. More generally, one might 
take the sequence to be different for every edge, that is, for each edge e there is 
a nondecreasing function : Z+ and T{Xn+i = w \ JF„) is proportional 

to F^{w{e,n)) with e= 

It is easy to see that for random walks of sequence type on any graph, if 
'^/F{n) < oo then with positive probability the sequence of choices out 
of a given edge will fixate. This extends to 

Theorem 5.3 f [Dav90l . Theorem 3.2]). Let {X^} be a random walk of sequence 



type on Z. //X^J^i ^/F{n) < oo then {Xn} almost surely is eventually trapped 
on a single edge. Conversely, ifY^'^=i ^/F{n) — oo then {Xn\ visits every vertex 
infinitely often almost surely. 

Proof: Assume first that Y^°^=i l/^l*^) < To see that sup„ Ar„ < oo with 
probability 1, it suffices to observe that for each k, conditional on ever reaching k, 
the probability that sup„X„ = fc is bounded below by Jl^i + -^(")) 

which is nonzero. The same holds for inf„X„, implying finite range almost 
surely. To improve this to almost sure fixation on a single edge, Davis applies 
Herman Rubin's Theorem (Theorem 13. 6p to show that the sequence of choices 
from each vertex eventually fixates. Conversely, if is infinite, then 

each choice is made infinitely often from each vertex, immediately implying 
recurrence or converge to ±oo. The latter is ruled out by means of an argument 

based on the fact that the sum Af„ := 1/F(w({j — l,j},n)) of the inverse 

k=l . I 

weights up to the present location is a supermartingale DavQCt Lemma 3.0]. □ 



Re mark 5 .4. The most general ERRW considered in the literature appears 
Davfldt . ere, the weights {w(e,n)} are arbitrary random variables subject 



m 



to w(e, n) e !Fn and u>(e, n + 1) > w(e, n) with equality unless e = {X„, X„+i}. 
The initial weights may be arbitrary as well, with the term initially fair used 
to denote all initial weights equal to 1. At this level of generality, there is no 
ex changea bility and the chief techniques are based on martingales. Lemma 3.0 
of [Dav9flt . used to rule out convergence to ±oo is in fact proved in the context 



of such a general, initially fair ERRW. 

Wh en the graph is not a tree, many of the arguments become more difficult. 



Sellke jSel94| extended the martingale technique to sequence-type ERRW on 
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the c?-dimensional integer lattice. Because of the bipartite nature of the graph, 
one must consider separately the sums J2'^=i '^/F{2n) and Y^^=i '^/F{2n + 1). 
For convenience, let us assume these two both converge or both diverge. 



Theorem 5.5 ( |Sel94l . Theorems 1-3]). IfY.n=i < then with proba- 



bility one, the process is eventually trapped on a single edge. IfYl'^=i 
00, then with probability one, the range is infinite and each coordinate is zero 
infinitely often. □ 

The proofs are idiosyncratic, based on martingales and Rubin's construction. 
It is noted that (i) the conclusion in the case X^^i ^/F{n) = oo falls short of 
recurrence; and {ii) that the conclusion of almost sure trapping in the opposite 
case is specific to bipartite graphs, with the argument not generalizing to the 
triangular l attice, n or even to a single triangle! This situation was not remedied 
until Limic LimOSj proved that for ERRW on a triangle, when F{n) = for 



> 1, th e walk is eventually trapped on a single edge. This was generalized 
LTOej to handle any F with J2n=i l/^(^) < oo- 



Because of the difficulty of proving results for sequence-type ERRW, it was 
thought that the special case of once-reinforced random walk might be a more 
tractable place to begin. Even here, no one has settled the question of recurrence 
versus transience for the two-dimensional integer lattice. The answer is known 
for a tree. In contrast to the phase transition in ordinary ERRW on a tree, a 
once-reinforced ERRW is transient for every J > (in fact the same is t rue when 
"once" is replaced by "fc times"). This w as prov ed for regular trees in 



and extended to Galton- Watson trees in [Die05 |. 

The only other graph for which I am aware of an analysis of once-reinforced 
ERRW is the ladder. Let G be the product of with K2 (the unique connected 
two- vertex graph); the vertices are Z x {0, 1} and the edges connect neighbors 
of Z with the same i4r2-coordinatc or two vertices with the same Z coordinate. 
The following recurrence result was first proved by T. Sellke in 1993 in the more 
general context of allowing arbitrary vertical movement (cf. |MR05|]). 

Theorem 5.6 ( |Sel06l . Theorem]). For any d > 0, once- reinforced random walk 
on the ladder is recurrent. □ 



Fig 3. The ladder graph of Theorem 1 5. 61 



A recent result of Merkl and RoUes Theorem 1.1] proves this for 

ERRW (as opposed to once-reinforced ERRW) as long as the ratio of the rein- 
forcement parameter to the initial edge weights is less than 4/3. 

A slight variation on the definition o f once-r einforced random walk gives th e 
excited random walk, introduced in jBW03l ] and taken up in |Voin3l : IZern4 
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These results are too recent to have been included in this survey. 



5.3. Vertex-reinforced random walk 

Recall that the vertex-reinforced random walk (VRRW) is defined analo- 
gously to the ERRW except that in the equation (|2.ip for choosing the next 
step, the edge occupation counts (|2.2p are replaced by the vertex occupation 
counts (113). 

This leads to entirely different behavior. Partial exchangeability is lost, so 
there is no representation as a random walk in a random environment. There are 
no obvious embedded urns. Moreover, an arbitrary occupation vector is unlikely 
to be evolutionarily stable. That is, suppose for some large n, the normalized 
occupation vector X„ whose components are the portion of the time spent at 
each vertex is equal to a vector x. Let tTx denote the stationary measure for the 
Markov chain with transition probabilities p{y, z) = x^/ '^^,^y x^/ which moves 
proportionally to the coordinate of x corresponding to the destination vertex. 
For 1 ^ fc <C n, X„+fc — (1 + o(l))X„, so the proportion of the time in [n, n-\-k'\ 
that the walk spends at vertex y will be proportional to TTx{y). It is easy to 
see from this that {X„} obeys a stochastic approximation equation (j2.6p with 

F{X) = TTx - X. 

The analysis from here depends on the nature of the graph. The methods 
of Section 12.51 show that X„ is an asymptotic pseudotrajectory for the flow 
dX/dt = F{X), converging to an equilibrium point or orbit. There is always 
a Lyapunov function V{x.) := Ax where A is the incidence matrix of the 
underlying graph G. Therefore, equilibrium sets are sets of constancy for V and 
any equilibrium point p is a critical point for V restricted to the face of the 
(d— l)-simplex containing p. Any attractor for the flow appears as a limit with 
positive probability, while line arly unst able orbits occur with probability zero. 
Several examples are given in Pem88lj | . 



Example 5.1. Let G be the complete graph on d vertices (with no self-edges). 
The zeros of F are the centroids of faces. The global centroid {1/d, . . . , l/d) is an 
attractor. Each other centroid is a permut ation of s ome (1/fc, . . . , 1/fc, 0, . . . , 0) 



and is easily seen to be linearly unstable [Pem88bl . page 110]. It follows that 
Xn i^/d, . . . ,1/d) with probability 1. 

Example 5.2. Let G be a cycle of d nodes for d > 5 (the smaller cases turn 
out to behave differently). The centroid {1/d, ... ,1/d) is stiU an isolated equi- 
librium but for d > 5, it is linearly u nstable. A lthough it was only guessed 
at the time this exa mple appeared in |Pem88b| . it follows from the noncon- 
vergence theorems of PemQOal : IBH95| that the probability of convergence to 



the centroid is zero. The other equilibria are cyclic permutations of the points 
(a, 1/2, 1/2 — g, 0, ■ . . , 0) and certain convex combinations of these. It was con- 
jectured in [Pem88bl | and corroborated by simulation that the extreme points, 



namely cyclic permutations of (a, 1/2, 1/2 — a, 0, . . . , 0), were the only possible 
limits. 
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Taking d ^ oo in the last example results in VRRW on the one-dimensional 
integer lattice. The analogous conjecture is that occupation measure for VRRW 
on Z c onverges to a translation of . . . , 0, 0, a, 1 /2, 1/2 — a, 0, 0, .... It was shown 
PV99l Theorem 1.1] that this happens with positive probability, leaving open 



the question of whether this occurs almost surely. Volkov strengthened and 
generalized this result. For infini te trees of bounded degree, VRRW gets trapped 
almost surely on a finite subtree VolOll Theorem 2] . In fact Volkov gives a graph 



theoretic definition of a trapping subgraph and shows that every finite graph 
has a trapping subgraplF^. Volkov shows that on any locally finite connected 
graph without self-edges, if there is a trapping subgraph, H, then VRRW is 
trapped with positive probability on the union of H and its neighbors. The 
neig hbors o f H are visited with frequency zero, according to specified power 
laws VolOll Corollary 1] . Finally, Tarres was able to close the gap from |PV99t 



for the one-dimensional lattice by proving almost sure trapping on an interval 
of exactly five vertices, with the conjectured power laws. 

Theorem 5.7 ( |Tar04l . Theorem 1.4]). Let {X„} denote VRRW on Z. With 
probability 1 there are (random) k ^iZi^a ^ (0)1) cif^d Ci,C2 > such that the 
following occur. 

(i) The set of vertices visited infinitely often is {fc — 2, /c — 1, /c, fc + 1, A: + 2}; 
(ii) The set of vertices visited with positive frequency is {k — l^k,k + 1} and 
these three frequencies have limits given respectively by a/2, 1/2 and (1 — 
«)/2; 

(Hi) The occupation measure at k — 2 is asymptotic to CiTi" and the occupation 
measure at k + 2 is asymptotic to C2n^^". 

□ 



5.4- An application and a continuous-time model 

Slime mold 

A mechanism by which simple organisms move in purposeful directions is called 
taxis. The organism requires a signal to govern such motion, which is usually 
something present in the environment such as sunlight, chemical gradient or 
particles of food. 



Othmer and Stevens |OS97l ] consider instances in which the organism's re- 
sponse modifies the signal. In particular, Othmer and Stevens study myxobac- 
teria: organisms which produce slime, over which it is then ea sier for bacteria to 
travel in the future. Aware of the work of Davis on ERRW |Dav90j . they pro- 



pose a stochastic cellular automaton to model the propagation of one or more 
bacteria. One of their goals is to determine what features of a model lead to 
stable aggregation of organisms; apparently previous such models have led to 
aggregates forming but then disbanding. 



^^In fact every graph of bounded degree has a trapping subgraph, though R. Thomas (per- 
sonal communication) has found an infinite, locally finite graph with no trapping subgraph. 
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In the Othmer-Stevens model, the build-up in slime at the intersection points 
of the integer lattice is modeled by postulating that the likelihood for the or- 
ganism to navigate to a given next vertex is one plus the number of previous 
visits to that site (by any organism). With one organism, this is just a VRRW, 
which they call a "simplified Davis' model", the simplification being to go from 
ERRW to VRRW. They allow a variable weight function W{n) = ^k- On 

page 1047, Othmer and Stevens describe results from simulations of the "sim- 
plified" VRRW for a single particle. Their analysis of the simulations may be 
paraphrased as follows. 

If F{n) grows exponentially, the particle ultimately oscillates between two points. 
If F grows linearly with a small growth rate, the particle does not stay in a fixed 
finite region. These two results agree with the theoretical result, which is proven, 
however, only in one dimension. If the growth is linear with a large growth rate, 
results of the simulation are "no longer comparable to the theoretical prediction" 
but this is because the time for a particle to leave a fixed finite region increases 
with the growth rate of F. 

Given what we know about VRRW, we can give a different interpretation of 
the simulation data. W e know that VRRW, unlike ERRW, fixates on a finite set. 
The results of |VolOl| imply that for the fixation set has positive probability 
both of being a 4-cycle and of being a plus sign (a vertex and its four neighbors). 
All of this is independent of the linear growth rate. Therefore, the simulations 
with large growth rates do agree with theory: the particle is being trapped rather 
than exiting too slowly to observe. On the other hand, for small values of the 
linear reinforcement parameter, the particle must also be trapped in the end, 
and in t his cas e it is the trapping that occurs too slowly to observe. The power 
laws in VolOll Corollary 1] and part {Hi) of Theorem 15.71 give an indication of 



why the trapping may occur too slowly to observe. 

Othmer and Stevens are ultimately concerned with the behavior of large 
collections of myxobacteria, performing a simultaneous VRRW (each particle 
at each time step chooses the next step independently, with probabilities pro- 
portional to the total reinforcement due to other any particle's visits to the 
destination site) . They make the assumption that the system may be described 
by differential equations corresponding to the mean-field limit of the system, 
where the state is described by a density over E^. They then give a rigorous 
analysis of the mean field differential equations, presumably related to scaling 
limits of ERRV\{ll. The mean-field assumption takes us out of the realm of rig- 
orous mathematics, so we will leave Othmer and Stevens here, but in the end 
they are able to argue that stable aggregation may be brought about by the 
purely local mechanisms of reinforced random walk. 

A continuous-time reinforced jump process 

The next section treats a number of continuous-time models. I include the 
vertex-reinforced jump process in this section because it is a process on dis- 



Analyses of these reaction-diffusion equations may be found in as well. 
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Crete space which does not involve a scahng Hmit and seems similar to the other 
models in this section. 

The vertex-reinforced jump process (VRJP) is a continuous-time process 
on the one-dimensional lattice. From any site x, at time t, it jumps to each 
nearest neighbor y at rate equal to one plus the amount of time, L{y, t), that the 
process has spent at y. On a state space that keeps track of occupation measure 
as well as position, it is Markovian. The process is defined and constructed 
m and attributed to W. Werner; because the jump rate at time t is 



bounded by 2 -I- the definition is completely routine. We may obtain a one- 
parameter family of reinforcement strengths by jumping at rate C + L{y,t) 
instead of 1 -I- L{y, t). 

The VRJP is a natural continuous-time analogue of VRRW. An alternative 
analogue would have been to keep the total jump rate out of a; at 1, the chance 
of a jump to y — X ±1 remaining proportional to the occupation measure at y. 
In fact the choice of variable jump rates decouples jumps to the left from jumps 
to the right, making the process more tractable. On any two consecutive sites 
a and a -I- 1, let m{t) denote the occupation measure of a +1 th e first time the 
occupation measure of a is t. Then m(t)/t is a martingale |DV02| . Corollary 2.3], 
which implies convergence of the ratio of occupation measures at a -I- 1 and a. 
Together with some computations, this leads to an exact characterization of the 
(unsealed) random limit normalized occupation measure. 



Theorem 5.8 ( DV02I . Theorem 1.1]). Let L{n,t) := J* lx^=nds be the occu- 



pation measure at n at time t. Then the limit Yn ■= limt^oo i ^L(n,t) exists 
for each n. Let {[/„} be IID random variables with density 

exp(-i(Vi- ^)) 



V2 



The collection {Yn : n £ 'Z} is distributed as {VF„/^j,^_^ Wk : n G Z} where 
Wo ^l,Wn^ riLi Uk ifn>0 and Wk = Uktn Ukifk<Q. □ 

This process may be defined on any locally finite graph. Limiting ratios of the 
occupation measure at neighboring vertices have the same description. On an 
infinite regular tree, this leads as in jPemSSaj to a transition between recurrence 
and transience, depending on the reinforcement parameter, C; see |D V04 ) . 



6. Continuous processes, limiting processes, and negative 
reinforcement 



In this section we will consider continuous processes with reinforcement. Espe- 
cially when these are diffusions, they might be termed "reinforced Brownian 
motion" . Some of these arise as scaling limits of reinforced random walks, while 
others are defined directly. We then consider some random walks with negative 
reinforcement. The most extreme example is the self-avoiding random walk, 
which is barred from going where it has gone before. Limits of self-avoiding 
walks turn out to be particularly nice continuous processes. 
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6.1. Reinforced diffusions 

Random walk perturbed at its extrema 

Recall the once- reinforced random walk of Section 15.21 This is a sequence-type 
ERRW with F(0) = 1 and F{n) = 1 -h <5 for any n > 1. The transition prob- 
abilities for this walk may be phrased as P{k,k + 1) = 1/2 unless {Xn} is 
at its maximum or minimum value, in which case ¥(k, k + I) = 1/(2 + (5) or 
{l + d)/{2 + S) respectively. 

If such a process has a scaling limit, the limiting process would evolve as a 
Brownian motion away from its left-to-right maxima and minima, plus some 
kind of drift inwards when it is at a left-to-right extremum. This inward drift 
might come from a local time process, but constructions depen ding on local 



time processes involve considerable technical difhculty (see, e.g., [TW98j). An 
alternate approach is an implicit definition that makes use of the maximum 
or minimum process, recalling the way a reflecting Brownian motion may be 
constructed as the difference, {Bt — Bf}, between a Brownian motion and its 
minimum process. 

Let a,/5 e (— oo, 1) be fixed, let g*{t) supq^^^j 17(5) denote the maxi- 
mum process of a function g and l et g'^jt) := info<s<t g{s) denote its minimmn 
process. Carmona, Petit and Yor |CPY98j examine the equation 

g{t)^ f{t) + ag*{t)+Pg*{t) , t>0. (6.1) 

They show that if / is any continuous function vanishing at 0, then there is a 
unique solution g{t) to (|6.ip . provided that p := |a/3/((l — a)(l — 0))\ < 1. If 
/ is the sample path of a Brownian motion, then results of CPY98l | imply that 



the solution Yt := g{t) to (|6.ip is adapted to the Brownian filtration. It is a 
logical candidate for a "Br ownian motion perturbed at its extrema" . 

In 1996, Burgess Davis jDav96j showed that the Carmona-Petit-Yor process 
is in fact the scaling limit of the once-reinforced random walk. His argument is 
based on the property that the map taking / to g in (|6.ip is bounded: Iffi — (72I < 
C|/i ~ /2|- The precise statement is as follows. Let a = [3 — —S. The process g 
will be well defined, since p = \S/{1 — 5)\'^ < 1. 

Theorem 6.1 ([ Dav96l Theorem 1.2]). Let {X„ : n > 0} 6e the once-reinforced 
random walk with parameter S > 0. Let {Yt '■ t > 0} solve (j6.ip with a = (3 = —d 
f a Brownian motion. Then 

{n-i/2^L"tj : i > 0} ^ {Ft : t > 0} 
^00. □ 



Drift as a function of occupation measure 



Suppose one wishes to formulate a diffusion that behaves like a Brownian mo- 
tion, pushed according to a drift that depends in some natural way on the past, 



Robin Pemantle/ Random processes with reinforcement 



57 



say through the occupation measure. Ther e are a multitude of ways to do this. 
One way, suggested by Durrett and Rogers [DR92j , is to choose a function / and 
let the drift of the diffusion {Xt} be given by /J f{Xt — Xg) ds. If / is Lipschitz 
then there is no trouble in showing that the equation 



Xt^Bt+ [ ds [ duf{X,-Xu) 
Jo Jo 



has a pathwise unique strong solution. 

Durrett and Rogers were the first to prove anything about such a process, 
but they could not prove much. When / has compact support, they proved 
that in any dimension there is a nonrandom bound limsupj^Q^ 1^*1/^ ^ 
almost surely, and that in one dimension when / > and /(O) > 0, then 
Xt/t /X almost surely for some nonrandom /i. The condition / > was 
weakened by ^CM96.] to be required only in a neighborhood of 0. Among Durrett 
and Rogers' conjectures are that if / is a compactly supported odd function 
with xf{x) > 0, then Xt/t almost surely. Their reasoning is that the 
process should behave roughly like a negatively once-reinforced random walk. 
It sees only the occupation in an interval, say [Xt ~ 1, Xt + I], drifting linearly 
to the right for a while due to the imbalance in the occupation, until diffusive 
fluctuations cause it to go to the left of its maximum. Now it should get pushed to 
the left at roughly linear rate until it suffers another reversal. They were not able 
to make this rigorous. However, taking the support to zero while maintaining 
J-oo /(^) dx = c gives a very interesting process about which Toth and Werner 
were able to obtain resul ts (see Section 16.31 below). 

Cranston and Le Jan take up this model in two special cases. When 



f{x) = —ax with a > 0, there is a restoring force equal to the total moment of 
the occupation measure about the present location. The restoring force increases 
without bound, so it may not be too surprising that Xt converges almost surely 
to the mean of the limiting occupation measure. 

Theorem 6.2 r [CLJ95l . Theorem 1]). Let a > and set f{x) — ax. Then there 
is a random variable X^^ such that Xt X^c almost surely and in L 



2 



Proof: This may be derived from the fact that the stochastic differential equa- 
tion 



dXt = dBt - 
has the (unique) strong solution 



t 

a I (Xt~ Xu) du 



dt 



Xt^ f hit, s)dBt 
Jo 



with h{t, s) = 1 - ase°'''/2 e-'''"'/^ du. □ 
The other case they consider is f{x) = —asgn{x). It is not hard to show 
existence and uniqueness despite the discontinuity. This time, the restoring force 
is toward the median rather than the mean, but otherwise the same result, 
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Xt ^ X should and does hold |CLJ95l Theorem 2]. This was extended to 
higher dimensions by the following theorem of Raimond. 

Theorem 6.3 ( [Rai97 . Theorem 1]). Let Bt be d- dimensional Brownian motion 
and fix cr > 0. Then the process {Xt\ that solves X{Q) ~ and 

dXt = dBt -<J I — ds dt 

Jo ll-^t--^.s|l 

converges almost surely. □ 



Drift as a function of normalized occupation measure 



The above diffusions have drift terms that are additive functionals of the full 
occupation measure fit '■= /ip + 6x, ds. Th e papers t hat analy ze this kind 
of diffusio n are [p R92; CLJO^ 
of papers [BLRoj IBR02I ; iBRQg 



Rai97 : ICM96t : see also [NRW87| . In a series 
BR05|, Benaim and Raimond (and sometimes 



Ledoux), consider diffusions whose drift is a function of the normalized occu- 
pation measure tti := t^^fit. Arguably, this is closer in spirit to the reinforced 
random walk. Another difference in the direction taken by Benaim and Raimond 
is that their state space is a compact manifold without boundary. This sets it 
apart from continuum limits of reinforced random walks on Z'' (not compact) 
or limits of urn processes on the {d — l)-simplex (has boundary). 

The theory is a vast extension of the dynamical system framework discussed 
in Section 12.51 To define the object of study, let M be a compact Riemannian 
manifold. There is a Riemannian probability measure, which we call simply dx, 
and a standard Brownian motion defined on M which we call Bt. Let V : MxM 
be a smooth "potential" function and define the function Vfj, by 

Vfi{y)^ f V{x,y)dfi{x). 



The additive functional of normalized occupation measure is always taken to 
be V-Kt = t~^V^t', thus the drift at time t should be V(T^7rt). Since Vni-) = 
J V{x, •) d^{x), we may write the stochastic differential equation as in [BLR02[: 



dXt ^dBt~j 



f VV{Xs,Xt)ds 
Jo 



dt. 



(6.2) 



A preliminary step establishes the existence of the process {Xt} from any 
start ing poin t, and including the possibility of any starting occupation mea- 



sure 



BLR02I . Proposition 2.5]. Simultaneously, this defines the occupation mea- 



sure process {fit} and normalized occupation measure process {Trj}. When t is 
large, iTt+s will remain near TTt for a while. As in the dynamical system and 
stochastic approximation framework, the next step is to investigate what hap- 
pens if one fixes the drift for times t + s a.t —V{VTTt). A diffusion on M with 
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drift —V/ has an invariant measure whose density may be described explicitly 
as 

exp(-/(x) 

This leads us to define a function 11 that associates with each measure fj, the 
density of the stationary measure for Brownian motion with potential function 

n(M):^ , "P^?^rL dx. (6.3) 



The process {iTt} evolves stochastically. Taking a cue from the framework of 
Section [2. 5i we compute the deterministic equation of mean flow, li t ^ St ^ 1, 
then TTt+st should be approximately nt + ^{Il{Trt) — nt). Thus we are led to 
define a vector field on the space of measures on M by 



A second preliminary step, carried out in |BLR02 . Lemma 3.1], is that this 
vector field is smooth and induces a flow <I>t on the space V{M) of probability 
measures on M satisfying 



dt 



As with stochastic approximation processes, one expects the trajectories of 
the stochastic process ttj to approximate trajectories of One expects con- 
vergence of TTf to flxed points or closed orbits of the flow, positive probability 
of convergence to isolated sinks, and zero probability of convergence to un- 
stable equilibri a . A good part of th e work accomplished in the sequence of 



papers 



BLR02I : lBR02l : IBR03I: |BR05 | is to extend results on asymptotic pseu- 
dotrajectories in Mf^ to prove these convergence and nonconvergence results in 
the space of measures on M. One link in the chain that does not need to be 
extended is that Theorem 12. 141 fasvmptotic pseudotrajectories have chain tran- 
sitive limits), which is already valid in a general metric space. Benai'm et al. 
then go on to prove the fol lowing r esults. The proof of the first is quite technical 
and occupies Section 5 of [BLR02|. 

Theorem 6.4 Theorem 3.6]). The flow n^t is an asymptotic pseudo- 

trajectory for the flow $4 . □ 

Corollary 6.5. The limit set of {T^t} is almost surely an invariant chain- 
recurrent set containing no proper attractor. 

Theorem 6.6 ( \BR0^ . Theorem 2.4]). Suppose that V is symmetric, that is, 
V{x,y) = V{y,x). With probability 1, the limit set of the process {'Kt] is a 
compact connected subset of the fixed points o/II (that is, the zero set of F). 

Proof: Define the free energy of a strictly positive / e L'^{dx) by 

J{f) --^Mf) ■■=\{vfj) + {f,\ogf) 
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where Vf denotes the potential Vf{y) — J V{x,y)f{x) dx and (/, 5) denotes 
/ fi^)9i^) dx. Next, verify that J is a Lyapunov function for the flow $t f [BR05l 
Proposition 4.1]) and that F{fj,) = if and only if has a density / and / is a 
critical point for the free energy, i.e., VJ(F) = ( |BR05 . Proposition 2.9]). The 



result then follows with a little work from Theorem 16.41 and the general result 
that an asymptotic pseudotrajectory is a chain transitive set. □ 

Corollary 6. 7 (' [BR05L Corollary 2.5]). //, in addition, the zero set of F con- 
tains only isolated points then ttj converges almost surely. □ 

The next two results are proved in a similar manner to the proofs of the 



convergence and nonconvergence results Theorem [2716] and Theorem l2.91 though 
some additional infrastructure must be built in the infinite-dimensional case. 

Theorem 6.8 f [BR05l Theorem 2.24]). Ifn* is a sink then P(7rt vr*) > 0. □ 

The nonconvergence results, as well as criteria for existence of a sink, the 
following definition is very useful. 

Definition 6.9 (Mercer kernel). Say that V is a Mercer kernel if {Vf,f) > 
for all f e L'^{dx). 

While t he assu mption of a symmetric Mercer kernel may appear restrictive. 



it is shown |BR05l . Examples 2.14-2.20] that many classes of kernels satisfy this, 
including the transition kernel for any reversible Markov semi-group, any even 
function of a; — y on the torus T" that has nonnegative Fourier coefficients, any 
completely monotonic function of | |x — yj ^ for a manifold embedded in M". and 
any V represented as V{x, y) = /g G{a, x)G{a, y) dv{a) for some space E and 
measure v (this last class is in fact dense in the set of Mercer measures) . The 
most important fact about Mercer kernels is that they are strictly convex. 



Lemma 6.10 f [BR05l Theorem 2.13]). IfV is Mercer then J is strictly 



hence has a unique critical point f which is a global minimum. 

Corollary 6.11. IfV is Mercer then the process tt^ converges almost surely to 
the measure f dx where f minimizes the free energy, J. □ 

Proof of lemma: The second derivative D^J is easily computed lBR0,4 
Proposition 2.9] to be 

D'^f{u,v){Vu,v)dx + {u,v)(i/f)ax ■ 

The second term is always positive definite, while the first is nonnegative definite 
by hypothesis. □ 
The only nonconvergence result they prove requires a hypothesis involving 
Mercer kernels. 

Theorem 6.12 f [BR05l . Theorem 2.26]). // TT* is a quadratically nondegenerate 
zero of F with at least one positive eigenvalue, and if V is the difference of 
Mercer kernels, then P(7rt ^ tt*) = 0. □ 
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A number of examples are given but perhaps the most interesting is one where 
V is not symmetric. It is possible that there is no Lyapunov function and that 
the limit set of ttj, which must be an asymptotic pseudotrajectory, may be a 
nontrivial orbit. In this case, one expects that /if should precess along the orbit 
at logarithmic speed, due to the factor of l/t in the mean differential equation 

dTrt/dt^{l/t)F{7Tt). 

Example 6.1. Let M — and for fixed c and let 

F(0i,02) = 2c-cos(0i-02 + 0). 

When (j) is not or tt this is not symmetric. A detailed trigonometric analysis 
shows that when c • cos0 > —1/2, the unique invariant set for <I>t is Lebesgue 
measure, dx, and hence that irt —> dx almost surely. 

Suppose now that c • cos0 < —1/2. If (j> = then the critical points of the 
free energy function are a one-parameter family of zeros of F with densities 
ge := Ci(c)e^2(^)-™'(^-^). It is shown in [BLR02, Theorem 1.1] that tt* gz 
almost surely, where Z is a random variable. The same holds when (p — tt. 

When (j) 0,TT then things are the most interesting. The forward limit set 
for {ttj} under $ consists of the unstable equilibrium point dx (Lebesgue mea- 
sure) together with a periodic orbit {pg : 9 ^ S^}, obtained by averaging gg 
while moving with logarithmic speed. To rule out the point dx as a limit for 
the stochastic process 16.21 would appear to require generalizing the noncon- 
vergence result Theorem 12.171 to the infinite-dimensional setting. It turns out, 
however, that the finite-dimensional projection /i t—^ J^i x d^ maps the process 
to a stochastic approximation process in the unit disk, that is, the evolution of 
Jgi X djjL depends on /i only through J^^ x dfi. For the projected process, is an 
unstable equilibrium, whence dx is almost surely not a limit point of {irt}- By 
Corollarv l6.5[ the limit set of the process is the periodic orbit. In fact there is 
a random variable Z £ such that 

Ikt - p\ogt+z\\ 0. 

This precise result relies on shadowing theorems such as Ben99l . Theorem 8.9]. 



6.2. Self-avoiding walks 

A path of finite length on the integer lattice is said to be self-avoiding if its 
vertices are distinct. Such paths have been studied in the context of polymer 
chemistry beginning with [Flo49t . where nonrigorous arguments were given to 



show that the diameter of a polymer chain of length n in three-space would 
be of order for some v greater than the value of 1/2 predicted by a sim- 
ple random walk model. Let n„ denote the set of self-avoiding paths in of 
length n starting from the origin. Surprisingly, good estimate s on th e number 
of such paths are still not known. Hammersley and Morton HM54I ] observed 



that |f2„| is sub-multiplicative: concatenation is a bijection between Tlj x 
and a set containing ^j+k- It follows that |ri„|^/" converges to inf^ |f2s;|^/'^. The 
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connective constant — fid, defined to be tlie value of this limit in Z'^, is 
not known, though rigorous estimates place /i2 & [2-62, 2.70] and nonrigorous 
estimates claim great precision. It is not known though wid ely beli eved that 
in any dimension, |f2„+i|/|r2„| — ^ fj,d', for closed loops Kesten [Kes63 [ did show 
that \nn+2\/\^ n\ *■ ■ 

Let Un denote the uniform measure on S7„ . Given that the cardinality of f2„ 
is poorly understood, it is not surprising that Un is also poorly understood. 
In dimensions five and higher, a substantial body of work by Hara and Slade 
has established the convergence under rescaling of J7„ to Brownian motion, con- 
vergence of |r2„4.i|/|f2„|, and values of several exponents and constants. Their 
technique is to use asymptotic expansions known as lace expansions, based 
on numbers of various sub-configurations in the path. See |MS93l] for a com- 
prehensive account of work up to 1993 or Slade's piece in the Mathematical 
Intelligencer |Sla94| for a nontechnical overview. 

In dimensions 2, 3 and 4, very little is rigorously known. Nevertheless, there 
arc many conjectures, such as the existence and supposed values of diffusion ex- 
ponent 1/ ~ for which the C/„-expected square distance between the endpoints 
of the path (usually denoted (i?^)) is of order n^"^. Absent rigorous results, the 
measure J7„ has been investigated by simulation, but even that is difficult. The 
exponential growth of J7„ prevents sampling for C/„ in any direct way once n is 
of order, say, 100. 

Var ious M onte Carlo sampling schemes have been proposed. Beretti and 



Sokal [BS85l | suggest a Markov Chain Monte Carlo algorithm, each step of 



which either extends the or retracts the path by one edge. Adjusting the rel- 
ative probabilities of extension and retraction produces a Markov chain whose 
stationary distribution approximates a mixture of the measures C/„ and which 
approaches this distribution in polynomial time, provided certain conjectures 
hold and parameters have been correctly adjusted. Randall and Sinclair take 
this a step furthe r, building into the algorithm foolproof tests of both of these 
provisions RSOOj . 

Of relevance to this survey are dynamic reinforcement schemes to produce 
self-avoiding or nearly self-avoiding random walks. It should be mentioned that 
there is no consensus on what measure should properly be termed the infinite 
self-avoiding random walk. If Un converges weakly then the limit is a candidate 
for such a walk. Two other ideas, discussed below, are to make the self-avoiding 
constraint soft, then take a limit, and to get rid of self- intersection by erasing 
loops as they form. 



'True' self-avoiding random walk 

For physicists, it is natural to consider the constraint of self-avoidance to be the 
limit of imposing a finite penalty for each self-intersection. In such a formulation, 
the probability of a path 7 is proportional to e~^'^^'^^ where the energy H{'y) 
is the sum of the penalties. 
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A variant on this is to develop the walk dynamically via 
¥{X„+i=y\X„ = x,Tn) 



y f.-f3-N(z,n) 



where N(z^ n) is the number of visits to z up to time n. This does not yield the 
same measure as the soft-constraint ensemble, but it has the advantage that it 
ext ends to a measure on infinite paths. Such a random walk was first considered 
by [APP83t and given the unfortunate name true self-avoiding walk. For 



finite inverse temperature /3, this object is nontrivial in one dimension as well 
as in higher dim ensions , and most of what is rigorously known pertains to one 



dimension. Toth jT6t95t proves a number of results. His penalty function counts 
the number of pairs of transitions across the same edge, rather than the number 
of pairs of times the walk is at the same vertex, but is otherwise the same as 



Appsg. 



In the terminology of this survey, we have an ERRW |T6t95l | or VRRW (APPSaj 
of sequence type, with sequence F(n) = e~ ^". Toth calls this exponential 



self-repulsion. In a subsequent paper the dynamics are generalized 



to s ubexponential self-repulsion F{n) = e , with < k < 1. The pa- 



Tot 961 : T6t97| then consider polynomial reinforcement F{n) = n°'. When 



pers 

a < this is self-repulsion and when a > it is self-attraction. The following 
results give a glimpse into this substantial body of work, concentrating on the 
case of self-repulsion. An overview, which includes the case of self-attraction, 
may be found in the survey iT6t99j. For technical reasons, instead of Xn in the 
first result, a random stopping time 0{N) = 9{X, N) is required. Define 0{N) 
to be a geometric random variable with mean XN, independent of all other 
variables. 



Theorem 6.13 (see [T6t99l Theorem 1.4]). Let {X„} be a sequence-type ERRW 



with sequence F{n) equal to one of the following, and define the constant v in 
each case as shown. 

1. F{n) — exp(— /3n) , v ■ 

2. F{n) = exp(-/3n'") , v = ^ 



1 



3. F{n) = n-" , . - ^ 

where a is a positive constant and < k < I. There is a one-parameter family 
of distribution functions {G\{t) : A > 0} such that as N oo, 

N-'FiXg^N) < x) ^ Gx{x) . 

The limit distribution G\ is not Gaussian even when v — 1/2. □ 

Toth also proves a Ray-Knight theorem for the local time spent on each edge. 
As expected the time scaling is where 7 = {1 — v)/v. 
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Loop-erased random walk 



Lawler LawSO j introduced a new way to generate a random self-avoiding path. 



Assume the dimension d is at least 3. Inductively, we suppose that at time n, a 
self-avoiding walk from the origin 7„ := (xq, xi,. . . , Xk) G ilfe has been chosen. 
Let Xn+i be chosen uniformly from the neighbors of Xk, independently of what 
has come before. At time n -I- 1, if Xn+i is distinct from all Xj , < j < fc, 
then 7„+i is taken to be {xq, . . . , Xk, Xn+i). If not, then 7„+i is taken to be 
{xo,...,Xr) for the unique r < n such that Xr = Xn+i (we allow 7„+i to 
become the empty sequence if r = 0). In other words, the final loop in the path 
(xo, . . . , Xr, Xr+i, ■ • • , Xn+i) is crased. In dimension three and higher, |X„| oo, 
and hence for each k the first k steps of 7„ are eventually constant. The limiting 
path 7 is therefore well defined on a set of probability 1 and is a deterministic 
function, the loop-erasure of the simple random walk path Xq, Xi, X2, ■ ■ ■, 
denoted LE(X). The loop-erased random walk measure, LERW is defined to be 
the law of 7. 

The process 7 = LE(X) seems to have little to do with reinforcement until one 
sees the following alternate description. Let {F„ : n > 0} be defined inductively 
hy Yq = and 

¥{Y^+,=y\Yo,...,Y„)= ^^^^ (6.4) 

where h{z) is the probability that a simple random walk beginning at z avoids 
{Fn, ■ ■ ■ , yn.l fo rever, with h{z) := if z = Yfc for some k < n. Lawler ob- 
served |Law91 . Proposition 7.3.1] that {Yn} has law LERW. Thus one might 



consider LERW to be an infinitely negatively reinforced VRRW that sees the 
future. Moreover, altering (|6.4p by conditioning on avoiding the past for time 
M instead of forever, then letting M ^ 00 gives a definition of LERW in two di- 
mensions that agrees with the loop-erasing construction when both are stopped 
at random times. 

The law of 7 = LE(X) is completely different from the laws Un and their 
putative limits, yet has some very nice features that make it worthy of study. 
It is time reversible, so for example the loop erasure of a random walk from a 
conditioned to hit b and stopped when it hits b has the same law if a and b are 
switched. The loop-erased random walk on an a rbitrar y graph is also intimately 
related to an algorithm of Aldous and Broder |Ald9d | for choosing a spanning 
tree uniformly. In dimensions five and above, LERW behaves the same way as the 
self-avoiding measure C/„, rescaling to a Brownian motion, but in dimensions 2, 
3 and 4, it has different connectivity and diffusion exponents from Un- 



6.3. Continuous time limits of self-avoiding walks 

Both the 'true' self-avoiding random walk and the loop-erased random walk 
have continuous limiting processes that are very pretty. The chance to spend a 
few paragraphs on each of these was a large part of my reason for including the 
entire section on negative reinforcement. 
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The 'true' self-repelling motion 

The true self-avoiding random walk with exponential self-repulsion was shown 
in Theorem 16. 131 (part 1) to have a limit law for its time-t marginal. In fact it 



has a limit as a process. Most o f this is shown in the paper [TW98|, with a key 



tightness result added in [NR06l |. Some properties of this limit process {Xt} are 



summarized as follows. In particular, having 3/2-variation it is not a diffusion. 



The process {Xt} has continuous paths. 

• It is recurrent. 

• It is self-similar: 

{Xt}^{a-y'X^t}. 

• It has non-trivial local variation of order 3/2. 

• The occupation measure at time t has a density; this may be called the 
local time Lt{x). 

• The pair (Xt,Lt(-)) is a Markov process. 

To construct this process and show it is the limit of the exponentially repul- 
sive true self- avoid ing walk, Toth and Werner rely on the Ray-Knight theory 



developed in [T6t95(] . While technical statements would involve too much no- 



tation, the gist is that the local time at the edge {k, fc -|- 1} converges under 
re-scaling, not only for fixed k but as a process in fc. A strange but convenient 
choice is to stop the process when the occupation time on an edge z reaches m. 
The joint occupations of the other edges {i, J -I- 1} then converge, under suitable 
rescaling, to a Brownian motion started at time z and position m and absorbed 
at zero once the time parameter is positive; if z < it is reflected at zero until 
then. When reading the previous sentence, be careful, as Ray-Knight theory has 
a habit of switching space and time. 

Because this holds separately for each pair (z,m) e M x M+, the limiting 
process {Xt} may be constructed in the strong sense by means of coupled coa- 
lescing Brownian motions {i3z,m(i) : t > •2}zeR,mGR+- These coupled Brownian 
motions are jointly limits of coupled simple random walks. On this level, the 
description is somewhat less technical, as follows. 

Let Ve denote the even vertices of 1? x Z+. For each {z,m) e Ve, flip an 
independent fair coin to determine a single directed edge from (z, m) to (z -|- 
1, mil); the exception is when m = 1; then for z < there is an edge {(z, 1), {z+ 
1,2)} while for z > there is a v-shaped edge {(z, 1), (z -I- 1,0), (z -I- 2, 1)}. 
Traveling rightward, one sees coalescing simple random walks, with absorption 
at zero once time is positive. A picture of this is shown. If one uses the even 
sites and travels leftward, one obtains a dual, distributed as a reflection (in time) 
of the original coalescing random walks. The complement of the union of the 
coalescing random walks and the dual walks is topologically a single path. Draw 
a polygonal path down the center of this path: the z-values when the center line 
crosses an integer level form a discrete process {Yn}. 

This process {Yn} is a different process from the true self- avoi ding wa lk we 
started with, but it has some other nice descriptions, discussed in |TW98L Sec- 




tion 11]. In particular, it may be described as an "infinitely negatively edge- 
reinforced random walk with initial occupation measure alternating between 
zero and one" . To be more precise, give nearest neighbor edges of z weight 1 if 
their center is at ±(1/2 + 2fc) for fc = 0, 1, 2, . . .. Thus the two edges adjacent 
to zero are both labeled with a one, and, going away from zero in either direc- 
tion, ones and zeros alternate. Now do a random walk that always chooses the 
less traveled edge, flipping a coin in the case of a tie (each crossing of an edge 
increases its weight by one). 

The process {1^} converges when rescaled to the process {Xt\ which is the 
scaling limit of the true self-avoiding walk. The limit operation in this case 
is more transparent: the coalescing simple random walks turn into coalescing 
Brownian motions. These Brownian motions arc the local time proce sses giv en 
by the Ray-Knight theory. The construction of the process in TW98| is 

in fact via these coalescing Brownian motions. 



The Stochastic Loewner Equation 

Suppose that the loop-erased random walk has a scaling limit. For specificity, 
it will be convenient to use the time reversal property of LERW and think of 
the walk as beginning on the boundary of a large disk and conditioned to hit 
the origin before returning to the boundary of the disk. The recursive /i-process 
formulation (|6.4p indicates that the infinitesimal future of such a limiting path 
would be a Brownian motion conditioned to avoid the path it has traced so 
far. Such conditioning, even if well defined, would seem to be complicated. But 
suppose, which is known about unconditioned Brownian motion and widely 
believed about many scaling limits, that the limiting LERW is conformally in- 
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variant. The complement of the infinite past is simply connected, hence by the 
Riemann Mapping Theorem, it is conformally homeomorphic to the open unit 
disk with the present location mapping to a boundary point. The infinitesimal 
future in these coordinates is a Brownian motion conditioned immediately to 
enter the interior of the disk and stay there until it hits the origin. If we could 
compute in these coordinates, such conditioning would be routine. 

In 2000, Schramm SchOC^ observed that such a conformal map may be com- 
puted via the classical Lowner equation. This is a differential equation sat- 
isfied by the conformal maps between a disk and the complement of a growing 
path inward from the boundary of the disk. More precisely, let /? be a compact 
simple path in the closed unit disk with one endpoint at zero and the other 
endpoint being the only point of P on dU. Let q : (—00, 0] /? \ {0} be a 
parametrization of /3 \ {0} and for each t < 0, let 

f{t,z):U ^U\qi[t,0]) (6.5) 

be the unique conformal map fixing and having positive real derivative at 0. 
Lowner Low23l | proved that 



Theorem 6.14 (Lowner's Slit Mapping Theorem). Given [3, there is a parametriza- 
tion q and a continuous function g : (— oo,0] — > dU such that the function 
f : U X (—00,0] —fUin (|6.5|) satisfies the partial differential equation 

df g{t) + z df 

with initial condition f{z,0) = z. 

The point q(t) is a boundary point of U\q([t, 0]), so it corresponds under the 
Riemann map f{t, •) to a point on dU. It is easy to see this must be g{t). Imagine 
that /3 is the scaling limit of LERW started from the origin and stopped when it 
hits dU (recurrence of two-dimensional random walk forces us to use a stopping 
construction) . Since a Brownian motion conditioned to enter the interior of the 
disk has an angular component that is a simple Brownian motion, it is not too 
great a leap to believe that g must be a Brownian motion on the circumference 
of dU, started from an arbitrary point, let us say 1. The solution to (|6.6p exists 
for any g, that is, given g, we may recover the path q. We may then plug in for 
g a Brownian motion with Ei?^ = Kt for some scale parameter k. We obtain 
what is known as the radial SLE„. 

More precisely, for any k > 0, any simply connected open domain D, and 
any x G dD,y £ D, there is a unique process SLEi^{D; x,y) yielding a path 
(3 as above from x to y. We have constructed SLEk(D; 1,0). This is sufficient 
because SLE^ is invariant under conformal maps of the triple (D;x,y). Letting 
y approach z € dD gives a well defined limit known as chordal SLEi^{D; x, z). 

Lawler, Schramm and Werner have over a dozen substantial papers describing 
SLEk for various k and using SLE to analyze various scaling limit s and solve 
some longstanding problems. A number of properties are proved in [RSOSj . For 
example, SLE^ is always a path, is self-avoiding if and only if k < 4, and is 
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space- filling when k > 8. Regardi ng the question of whether SLE is the scaling 
limit of LERW, it was shown in that if LERW has a scaling limit and 



this is conformally invariant, then this limit is SLE2. The conformally invariant 
limit was confirmed just a few years later: 

Theorem 6.15 ( jLSW04l . Theorem 1.3]). Two-dimensional LERW stopped at 
the boundary of a disk has a scaling limit and this limit is conformally invariant. 
Consequently, the limit is SLE2. 

In the same paper, Lawler, Schramm and Werner show that the peano curve 
separating an infinite uniform spanning tree from its dual has SLEg as its scal- 
ing limit. The SLEg is not self-avoiding, but its outer boundary is, up to an 
inessential transformation, the same as the outer boundary of a two-dimensional 
Brownian motion run until a certain stopping time. A recently announced result 
of Smirnov is that the interface between positive and negative clusters of the 
two-dimensional Ising model is an SLE3. It is conjectured that the scaling limit 
of the classical self-avoiding random walk is SLEg/a, the conjecture following if 
such a scaling limit can be proved to exist and be conformally invariant. 
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